Chapter 7 Advanced Search Algorithms

7.1 Architecture Patterns for Information Retrieval & Semantic Search

7.1.1 Traditional Information Retrieval (IR) Systems

Keyword-Based Search:
- Description: Uses keyword matching to retrieve documents containing specific terms.
- Components: Inverted index, term frequency-inverse document frequency (TF-IDF), BM25.
- Pros: Fast and efficient for exact keyword matches.
- Cons: Limited understanding of context and semantics.

7.1.2 Semantic Search Systems

Vector-Based Search:
- Description: Uses vector embeddings to capture the semantic meaning of queries and documents.
- Components: Embedding models (e.g., BERT, GPT), vector databases, approximate nearest neighbor (ANN) search.
- Pros: Better understanding of context and semantics, handles synonyms and related terms.
- Cons: Computationally intensive, requires specialized infrastructure1.

7.1.3 Hybrid Search Systems

Combination of Keyword and Semantic Search:
- Description: Integrates both keyword-based and vector-based search to leverage the strengths of each approach.
- Components: Inverted index, embedding models, hybrid ranking algorithms.
- Pros: Balances precision and recall, improves search relevance.
- Cons: More complex to implement and maintain1.

7.1.4 Retrieval-Augmented Generation (RAG)

Description: Combines information retrieval with generative models to enhance the quality of responses.
- Components: Document retrieval, embedding models, generative models (e.g., GPT-3), orchestrators.
- Pros: Provides contextually rich and accurate responses, useful for chatbots and QA systems.
- Cons: Requires careful tuning and evaluation to ensure relevance and accuracy2.

7.1.5 Neural IR Systems

Deep Learning-Based Search:
- Description: Uses deep learning models to directly learn the relevance of documents to queries.
- Components: Neural ranking models, training data, GPUs/TPUs.
- Pros: High accuracy, learns complex patterns and relationships.
- Cons: Requires large amounts of labeled data, high computational cost1.

7.1.6 Challenges

Scalability:
- Description: Handling large-scale data efficiently.
- Impact: Requires distributed systems and efficient indexing techniques.
Latency:
- Description: Ensuring fast response times for user queries.
- Impact: Critical for real-time applications, necessitates optimization of search algorithms.
Relevance:
- Description: Providing accurate and contextually relevant results.
- Impact: Balancing precision and recall, tuning models for specific domains.
Resource Utilization:
- Description: Managing computational and memory resources.
- Impact: Efficient resource allocation to avoid bottlenecks and ensure smooth operation.

7.2 Why It’s Important to Have Very Good Search

Having a very good search system is crucial for several reasons:

User Experience

Efficiency: Users can quickly find the information they need, leading to a more satisfying experience.
Relevance: Accurate search results ensure users find relevant content, reducing frustration and increasing engagement.

Productivity

Time-Saving: Efficient search systems save time for users by providing quick access to necessary information.
Decision-Making: Access to relevant information aids in making informed decisions, whether in business, research, or daily tasks.

Business Value

Customer Satisfaction: Good search capabilities can enhance customer satisfaction and loyalty, especially in e-commerce and service-based industries.
Revenue Generation: For e-commerce platforms, effective search can lead to higher conversion rates and increased sales.

Data Utilization

Insight Discovery: Advanced search systems can help uncover insights from large datasets, aiding in analytics and strategic planning.
Knowledge Management: Efficient search helps in managing and retrieving organizational knowledge, improving overall efficiency.

Competitive Advantage

Innovation: Companies with superior search capabilities can innovate faster by leveraging accessible information.
Market Position: Good search functionality can differentiate a company from its competitors, providing a unique selling point.

Scalability

Handling Growth: As data volumes grow, a robust search system ensures continued performance and relevance.
Adaptability: Advanced search systems can adapt to new data types and user needs, maintaining their effectiveness over time.

In summary, a very good search system enhances user experience, boosts productivity, drives business value, maximizes data utilization, provides a competitive edge, and ensures scalability. Investing in a high-quality search solution is essential for any organization looking to thrive in the information age.

7.3 Examples of Good Search Systems

Here are some examples of popular and effective search systems:

Google

Description: The most widely used search engine globally, known for its powerful algorithms and vast index.
Key Features: Advanced search algorithms, personalized search results, integration with various Google services1.

Microsoft Bing

Description: A strong alternative to Google, offering robust search capabilities and integration with Microsoft products.
Key Features: AI-powered search, integration with Microsoft Office, image and video search2.

DuckDuckGo

Description: A privacy-focused search engine that does not track user activity.
Key Features: Anonymous search, no personalized search results, strong privacy policies2.

7.3.1 4. Elasticsearch

Description: An open-source search and analytics engine used for a wide range of applications, including log and event data analysis.
Key Features: Real-time search, scalability, powerful querying capabilities3.

Algolia

Description: A hosted search engine API designed for building fast and relevant search experiences.
Key Features: Instant search, customizable ranking, analytics3.

Solr

Description: An open-source search platform from the Apache Lucene project, widely used for enterprise search applications.
Key Features: Full-text search, faceted search, scalability3.

Amazon Kendra

Description: An AI-powered enterprise search service from AWS that enables organizations to search across different content repositories.
Key Features: Natural language understanding, machine learning-based relevance tuning, secure access controls3.

These search systems are known for their effectiveness, scalability, and ability to handle various types of search queries, making them suitable for different use cases and industries.

7.4 Comparison of Google and Bing’s Algorithms

7.4.1 Google Search Algorithm

Crawling and Indexing:
- Crawling: Google uses web crawlers (Googlebot) to discover and fetch web pages. It employs an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site1.
- Indexing: After crawling, Google analyzes the content (text, images, videos) and stores the information in its index. This index is a large database that helps Google quickly retrieve relevant information1.
Ranking Factors:
- Relevance: Google assesses the relevance of content based on keywords, user intent, and context. It uses language models to understand the meaning behind queries2.
- Quality: The quality of content is evaluated based on expertise, authoritativeness, and trustworthiness (E-A-T). Google also considers the usability of pages, such as mobile-friendliness and page load speed2.
- User Engagement: Interaction data, such as click-through rates and dwell time, are used to gauge the relevance and usefulness of search results2.
Personalization:
- Location and Settings: Google personalizes search results based on the user’s location, search history, and preferences2.
- Freshness: For certain queries, such as news topics, Google prioritizes fresh content2.

7.4.2 Bing Search Algorithm

Crawling and Indexing:
- Crawling: Bing uses its own web crawlers to discover and fetch web pages. It focuses on finding high-quality and relevant content3.
- Indexing: Bing analyzes the content and stores it in its index, similar to Google. It emphasizes the credibility and authority of the indexed content3.
Ranking Factors:
- Relevance: Bing evaluates the relevance of content based on keywords, context, and user intent. It uses various signals to understand the query and match it with the best results4.
- Quality: Bing assesses the quality of content by considering factors like page load time, mobile-friendliness, and the credibility of the source5.
- User Engagement: Bing uses user interaction data, such as click-through rates and dwell time, to determine the relevance and usefulness of search results5.
Personalization:
- Location and Settings: Bing personalizes search results based on the user’s location, search history, and preferences3.
- Content Moderation: Bing takes additional steps to ensure the safety and credibility of search results, providing warnings and supplemental information for potentially harmful content3.

7.4.3 Key Differences

Algorithm Complexity: Google’s algorithm is often considered more complex and sophisticated, with a stronger emphasis on understanding user intent and context through advanced language models2.
Content Quality: Both search engines prioritize high-quality content, but Google places a stronger emphasis on E-A-T (Expertise, Authoritativeness, Trustworthiness) principles2 5.
User Privacy: Bing places a significant focus on user privacy and content moderation, ensuring that search results are credible and safe3.

7.5 Improving the Accuracy of a RAG-Based System

If a Retrieval-Augmented Generation (RAG) system is not providing accurate results due to issues with the retrieval component, here are the steps to improve it:

Evaluate and Diagnose

Analyze Retrieval Performance: Use metrics such as precision, recall, and F1-score to evaluate the current performance of the retrieval system.
Identify Failure Points: Determine where the retrieval system is failing. Common issues include missing relevant documents, retrieving irrelevant documents, or poor ranking of results.

Data Quality Improvement

Data Cleaning: Ensure that the data used for retrieval is clean and well-structured. Remove duplicates, correct errors, and standardize formats.
Data Enrichment: Enhance the dataset with additional metadata and context to improve retrieval accuracy.

Indexing Optimization

Chunking: Split documents into smaller, meaningful chunks to improve the granularity of retrieval.
Embedding Quality: Use high-quality embedding models to convert text into vectors. Fine-tune these models on domain-specific data if necessary.

Retrieval Algorithm Enhancement

Hybrid Retrieval: Combine lexical search (e.g., BM25) with semantic search (e.g., vector embeddings) to leverage the strengths of both approaches.
Re-Ranking: Implement a re-ranking step where retrieved documents are re-evaluated using more sophisticated models to improve the final ranking.

Query Processing

Query Expansion: Expand user queries with synonyms and related terms to improve recall.
Query Rewriting: Rewrite queries to be more precise and contextually relevant.

Continuous Evaluation and Tuning

Automated Testing Framework: Set up a framework to continuously test the retrieval system with a diverse set of queries and measure performance.
User Feedback Loop: Incorporate user feedback to identify and correct retrieval errors.

Advanced Techniques

Contextual Retrieval: Use contextual embeddings to retain important context within document chunks.
Domain-Specific Fine-Tuning: Fine-tune retrieval models on domain-specific datasets to improve relevance.

7.6 Keyword-Based Retrieval Method

Keyword-based retrieval is a traditional method used in information retrieval systems to find documents that contain specific keywords from a user’s query. Here’s an explanation of how it works:

7.6.1 Indexing

Inverted Index: The core of keyword-based retrieval is the inverted index, which maps each keyword to the list of documents that contain it. This allows for quick lookups of documents based on the presence of keywords.
Tokenization: Documents are broken down into individual words or tokens. These tokens are then indexed to facilitate fast retrieval.

7.6.2 Query Processing

Query Parsing: The user’s query is parsed to extract keywords. This may involve removing stop words (common words like “the”, “and”, “is”) and stemming (reducing words to their root form, e.g., “running” to “run”).
Boolean Operators: Users can use Boolean operators (AND, OR, NOT) to refine their queries. For example, “apple AND orange” retrieves documents containing both keywords, while “apple OR orange” retrieves documents containing either keyword.

7.6.3 Retrieval

Matching: The system searches the inverted index to find documents that contain the query keywords. This is typically done using Boolean logic to combine the results of individual keyword searches.
Scoring: Documents are scored based on the frequency and position of the keywords. Common scoring methods include Term Frequency-Inverse Document Frequency (TF-IDF) and BM25.

7.6.4 Ranking

Relevance Ranking: The retrieved documents are ranked based on their relevance to the query. Higher scores indicate greater relevance.
Presentation: The top-ranked documents are presented to the user in order of relevance.

7.6.5 Advantages

Efficiency: Keyword-based retrieval is fast and efficient, especially for large datasets.
Simplicity: The method is straightforward to implement and understand.
Control: Users have control over the search process through the use of Boolean operators.

7.6.6 Disadvantages

Lack of Context: The method does not understand the context or semantics of the query, leading to potential mismatches.
Synonym Handling: It may miss relevant documents that use synonyms or related terms not included in the query.
Precision vs. Recall: Balancing precision (retrieving relevant documents) and recall (retrieving all relevant documents) can be challenging.

7.7 How to Fine-Tune Re-Ranking Models

Fine-tuning re-ranking models is essential for improving the accuracy and relevance of search results in information retrieval systems. Here are the steps to fine-tune these models:

7.7.1 Data Preparation

Collect Training Data: Gather a large dataset of query-document pairs with relevance labels. Public datasets like MS MARCO or TREC can be useful1.
Preprocess Data: Clean and preprocess the data by tokenizing text, removing stop words, and normalizing text (e.g., lowercasing).

7.7.2 Model Selection

Choose a Base Model: Select a pre-trained model suitable for re-ranking, such as BERT, RoBERTa, or a cross-encoder model2.
Initialize Model: Load the pre-trained weights and prepare the model for fine-tuning.

7.7.3 Fine-Tuning Process

Define Loss Function: Use a loss function appropriate for ranking tasks, such as pairwise loss (e.g., hinge loss) or listwise loss (e.g., ListNet).
Training Loop: Implement a training loop that iteratively updates the model weights based on the loss function. Use techniques like gradient descent and backpropagation.
Batching: Use mini-batches to efficiently process large datasets and improve training stability.

7.7.4 Evaluation and Validation

Split Data: Divide the dataset into training, validation, and test sets to evaluate model performance.
Metrics: Use ranking-specific metrics such as NDCG (Normalized Discounted Cumulative Gain), MAP (Mean Average Precision), and MRR (Mean Reciprocal Rank) to assess the model3.
Hyperparameter Tuning: Experiment with different hyperparameters (e.g., learning rate, batch size) to optimize model performance.

7.7.5 Deployment and Monitoring

Deploy Model: Integrate the fine-tuned model into the retrieval system for re-ranking search results.
Monitor Performance: Continuously monitor the model’s performance in production using real-time metrics and user feedback.
Iterate: Regularly update the model with new data and re-fine-tune to maintain and improve accuracy.

7.8 Common Loss Functions Used in Machine Learning

7.8.1 Mean Squared Error (MSE)

Description: Measures the average squared difference between predicted and actual values.
Use Case: Commonly used in regression tasks.
Formula: $\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

7.8.2 Cross-Entropy Loss

Description: Measures the difference between two probability distributions, often used for classification tasks.
Use Case: Widely used in binary and multi-class classification.
Formula:
- Binary: $\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$
- Categorical: $\text{Cross-Entropy} = -\sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$

7.8.3 Hinge Loss

Description: Used for “maximum-margin” classification, primarily with support vector machines (SVMs).
Use Case: Binary classification tasks.
Formula: $\text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)$

7.8.4 Kullback-Leibler Divergence (KL Divergence)

Description: Measures how one probability distribution diverges from a second, expected probability distribution.
Use Case: Often used in variational autoencoders and other probabilistic models.
Formula: $D_{KL}(P || Q) = \sum_{i} P(i) \log \frac{P(i)}{Q(i)}$

7.8.5 Mean Absolute Error (MAE)

Description: Measures the average absolute difference between predicted and actual values.
Use Case: Regression tasks where outliers are less influential.
Formula: $\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$

7.8.6 Huber Loss

Description: Combines the best properties of MSE and MAE, less sensitive to outliers than MSE.
Use Case: Regression tasks with outliers.
Formula:
- $\text{Huber Loss} = \begin{cases} \frac{1}{2}(y_i - \hat{y}_i)^2 & \text{for } |y_i - \hat{y}_i| \leq \delta \\ \delta |y_i - \hat{y}_i| - \frac{1}{2}\delta^2 & \text{otherwise} \end{cases}$

7.8.7 Triplet Loss

Description: Used to learn embeddings by comparing anchor, positive, and negative samples.
Use Case: Face recognition, metric learning.
Formula: $\text{Triplet Loss} = \max(0, d(a, p) - d(a, n) + \alpha)$
- Where $d$ is the distance function, $a$ is the anchor, $p$ is the positive sample, $n$ is the negative sample, and $\alpha$ is the margin.

7.9 Most Common Metric Used in Information Retrieval and When It Fails

7.9.1 Precision@k

Precision@k is one of the most common metrics used in information retrieval to evaluate the relevance of the top-k retrieved documents. It measures the proportion of relevant documents among the top-k results.

7.9.2 Definition

Formula: $\text{Precision@k} = \frac{\text{Number of relevant documents in top-k}}{k}$
Interpretation: A higher Precision@k indicates that a larger proportion of the top-k retrieved documents are relevant to the query.

7.9.3 When It Fails

Ignoring Position of Relevant Documents:
- Issue: Precision@k does not consider the position of relevant documents within the top-k results. For example, if relevant documents are ranked lower within the top-k, Precision@k treats it the same as if they were ranked higher.
- Impact: This can lead to misleading evaluations where the metric suggests good performance even if the most relevant documents are not ranked at the top1.
Lack of Recall Consideration:
- Issue: Precision@k focuses only on the top-k results and does not account for how many relevant documents are missed beyond the top-k.
- Impact: This can be problematic in scenarios where it is important to retrieve as many relevant documents as possible, such as in legal or medical searches2.
Binary Relevance Assumption:
- Issue: Precision@k assumes binary relevance (a document is either relevant or not), which may not capture the varying degrees of relevance.
- Impact: This can oversimplify the evaluation, especially in cases where some documents are more relevant than others3.

7.9.4 Alternatives and Complements

To address these limitations, other metrics are often used in conjunction with Precision@k:

Recall@k: Measures the proportion of relevant documents retrieved out of all relevant documents available.
F1-Score@k: The harmonic mean of Precision@k and Recall@k, providing a balanced evaluation.
NDCG (Normalized Discounted Cumulative Gain): Considers the position of relevant documents and their graded relevance, offering a more nuanced evaluation4.

7.10 Evaluation Metric for a Quora-like Question-Answering System

To ensure users find the most pertinent answers as quickly as possible in a Quora-like question-answering system, the F1 Score would be an ideal evaluation metric. The F1 Score balances precision and recall, providing a comprehensive measure of the system’s effectiveness.

7.10.1 Why F1 Score?

Precision:
- Definition: Precision measures the proportion of relevant answers among the retrieved answers.
- Importance: High precision ensures that the answers provided to users are relevant, reducing the likelihood of irrelevant or incorrect answers.
Recall:
- Definition: Recall measures the proportion of relevant answers that were retrieved out of all relevant answers available.
- Importance: High recall ensures that the system retrieves as many relevant answers as possible, increasing the chances that users find the information they need.
F1 Score:
- Definition: The F1 Score is the harmonic mean of precision and recall, providing a single metric that balances both aspects.
- Formula: $\text{F1 Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$
- Importance: The F1 Score is particularly useful when there is an uneven class distribution, ensuring that both precision and recall are considered equally.

7.10.2 When F1 Score Fails

While the F1 Score is a robust metric, it has limitations:

Ignoring Answer Position:
- Issue: The F1 Score does not consider the position of relevant answers within the list of retrieved answers.
- Impact: Users may still need to sift through multiple answers to find the most relevant one, affecting the user experience.
Binary Relevance:
- Issue: The F1 Score assumes binary relevance (an answer is either relevant or not), which may not capture varying degrees of relevance.
- Impact: This can oversimplify the evaluation, especially in cases where some answers are more relevant than others.

7.10.3 Complementary Metrics

To address these limitations, consider using additional metrics:

Mean Reciprocal Rank (MRR): Measures the rank of the first relevant answer, emphasizing the importance of presenting relevant answers at the top1.
Normalized Discounted Cumulative Gain (NDCG): Considers the position and relevance of answers, providing a more nuanced evaluation2.

By using the F1 Score along with complementary metrics like MRR and NDCG, you can ensure a comprehensive evaluation of your question-answering system’s effectiveness.

7.11 Evaluation Metrics for a Recommendation System

To evaluate the effectiveness of a recommendation system, several metrics can be used. Here are some of the most common and useful ones:

Precision@K

Description: Measures the proportion of recommended items in the top-K set that are relevant.
Formula: $\text{Precision@K} = \frac{\text{Number of relevant items in top-K}}{K}$
Use Case: Useful when the focus is on the relevance of the top recommendations.

Recall@K

Description: Measures the proportion of all relevant items that are recommended in the top-K set.
Formula: $\text{Recall@K} = \frac{\text{Number of relevant items in top-K}}{\text{Total number of relevant items}}$
Use Case: Important when it is crucial to retrieve as many relevant items as possible.

F1 Score@K

Description: The harmonic mean of Precision@K and Recall@K, providing a balance between the two metrics.
Formula: $\text{F1 Score@K} = 2 \cdot \frac{\text{Precision@K} \cdot \text{Recall@K}}{\text{Precision@K} + \text{Recall@K}}$
Use Case: Useful when both precision and recall are important.

Mean Average Precision (MAP)

Description: Measures the mean of the average precision scores for each query.
Formula: $\text{MAP} = \frac{1}{Q} \sum_{q=1}^{Q} \text{Average Precision}(q)$
Use Case: Provides a single-figure measure of quality across multiple queries.

Normalized Discounted Cumulative Gain (NDCG)

Description: Measures the ranking quality by considering the position of relevant items in the recommended list.
Formula: $\text{NDCG@K} = \frac{DCG@K}{IDCG@K}$
- Where $DCG@K$ is the Discounted Cumulative Gain at K, and $IDCG@K$ is the Ideal DCG at K.
Use Case: Useful when the order of recommendations is important.

Mean Reciprocal Rank (MRR)

Description: Measures the average of the reciprocal ranks of the first relevant item.
Formula: $\text{MRR} = \frac{1}{Q} \sum_{q=1}^{Q} \frac{1}{\text{rank}_q}$
Use Case: Useful when it is important to have relevant items ranked very high.

Coverage

Description: Measures the proportion of items that can be recommended.
Formula: $\text{Coverage} = \frac{\text{Number of items recommended}}{\text{Total number of items}}$
Use Case: Important for understanding the system’s ability to recommend a diverse set of items.

Diversity

Description: Measures how diverse the recommended items are.
Formula: Various methods can be used, such as the average dissimilarity between recommended items.
Use Case: Important for ensuring that recommendations are not too similar to each other.

Serendipity

Description: Measures the ability of the system to recommend unexpected but relevant items.
Formula: Various methods can be used, often involving user feedback.
Use Case: Important for enhancing user satisfaction by providing novel recommendations.

User Satisfaction

Description: Measures the overall satisfaction of users with the recommendations.
Formula: Typically gathered through user surveys and feedback.
Use Case: Directly reflects the user experience and satisfaction.

7.12 Mean Average Precision (MAP) in Detail

Mean Average Precision (MAP) is a metric used to evaluate the performance of information retrieval systems, such as search engines and recommendation systems. It provides a single-figure measure of quality across multiple queries by averaging the precision scores at different ranks.

7.12.1 Key Concepts

Precision:
- Definition: Precision at a given rank $k$ (Precision@k) is the proportion of relevant documents among the top $k$ retrieved documents.
- Formula: $\text{Precision@k} = \frac{\text{Number of relevant documents in top-} k}{k}$
Average Precision (AP):
- Definition: Average Precision for a single query is the average of the precision values obtained after each relevant document is retrieved.
- Formula: $\text{AP} = \frac{1}{R} \sum_{k=1}^{N} P(k) \cdot \text{rel}(k)$
  - Where $R$ is the total number of relevant documents for the query.
  - $N$ is the total number of retrieved documents.
  - $P(k)$ is the precision at rank $k$ .
  - $\text{rel}(k)$ is a binary indicator function that is 1 if the document at rank $k$ is relevant, and 0 otherwise.
Mean Average Precision (MAP):
- Definition: MAP is the mean of the Average Precision scores for a set of queries.
- Formula: $\text{MAP} = \frac{1}{Q} \sum_{q=1}^{Q} \text{AP}(q)$
  - Where $Q$ is the total number of queries.
  - $\text{AP}(q)$ is the Average Precision for query $q$ .

7.12.2 Steps to Calculate MAP

Calculate Precision@k for Each Query:
- For each query, calculate the precision at each rank $k$ where a relevant document is retrieved.
Compute Average Precision (AP) for Each Query:
- Average the precision values obtained at each rank where a relevant document is retrieved.
Compute Mean Average Precision (MAP):
- Average the AP scores across all queries to get the MAP.

7.12.3 Example

Let’s consider a simple example with two queries and a retrieval system that returns a ranked list of documents for each query.

7.12.3.1 Query 1:

Relevant documents: {D1, D3, D4}
Retrieved documents: [D1, D2, D3, D4, D5]

7.12.3.2 Query 2:

Relevant documents: {D2, D5}
Retrieved documents: [D2, D1, D5, D3, D4]

Calculating AP for Query 1: - Precision@1 = 1/1 = 1.0 (D1 is relevant) - Precision@3 = 2/3 = 0.67 (D3 is relevant) - Precision@4 = 3/4 = 0.75 (D4 is relevant) - AP = (1.0 + 0.67 + 0.75) / 3 = 0.81

Calculating AP for Query 2: - Precision@1 = 1/1 = 1.0 (D2 is relevant) - Precision@3 = 2/3 = 0.67 (D5 is relevant) - AP = (1.0 + 0.67) / 2 = 0.835

Calculating MAP: - MAP = (0.81 + 0.835) / 2 = 0.8225

7.12.4 When MAP Fails

Binary Relevance:
- Issue: MAP assumes binary relevance (a document is either relevant or not), which may not capture varying degrees of relevance.
- Impact: This can oversimplify the evaluation, especially in cases where some documents are more relevant than others.
Position Sensitivity:
- Issue: While MAP considers the rank of relevant documents, it does not differentiate between the importance of different positions as effectively as metrics like NDCG.
- Impact: This can lead to less emphasis on the top-ranked results, which are often the most critical for user satisfaction.

7.13 How Hybrid Search Works

Hybrid search combines the strengths of both keyword-based (lexical) search and vector-based (semantic) search to enhance the overall search experience. This approach leverages the precision of keyword matching and the contextual understanding of semantic search.

7.13.1 Key Components

Lexical Search:
- Description: Focuses on matching exact keywords within a query.
- Strengths: Excels in scenarios where precision is paramount, such as retrieving documents with exact keyword matches.
- Techniques: Uses traditional search algorithms like BM25 to rank documents based on keyword relevance1.
Semantic Search:
- Description: Uses vector embeddings to capture the semantic meaning of queries and documents.
- Strengths: Understands the context and intent behind user queries, providing more relevant results even if exact keywords are not present.
- Techniques: Employs machine learning models to generate embeddings and uses algorithms like Hierarchical Navigable Small World (HNSW) for efficient similarity search1.

7.13.2 How Hybrid Search Operates

Indexing:
- Combined Index: The search index contains fields for both plain text (for lexical search) and vector embeddings (for semantic search). This allows the system to handle both types of queries simultaneously1.

Query Processing:

Hybrid Query: A single query request includes parameters for both keyword search and vector search. These queries are executed in parallel1.

Example:

{
  "search": "historic hotel walk to restaurants and shopping",
  "vectorQueries": [
    {
      "kind": "vector",
      "vector": [ /* array of embeddings */ ],
      "k": 50,
      "fields": "DescriptionVector"
    }
  ]
}

Retrieval and Ranking:
- Parallel Execution: Both lexical and vector searches are performed simultaneously. The results from each method are retrieved and ranked using their respective algorithms (e.g., BM25 for text, HNSW for vectors)1.
- Result Merging: The results from both searches are merged using algorithms like Reciprocal Rank Fusion (RRF), which combines the rankings to produce a unified result set1.
Final Ranking:
- Unified Results: The merged results are presented to the user in a single ranked list, ensuring that both precise keyword matches and contextually relevant results are included1.

7.13.3 Benefits of Hybrid Search

Enhanced Relevance: Combines the precision of keyword search with the contextual understanding of semantic search, providing more relevant results.
Improved User Satisfaction: Users receive a comprehensive set of results that are both accurate and contextually appropriate.
Versatility: Capable of handling a wide range of queries, from exact keyword matches to complex, context-driven searches2.

7.14 Merging and Homogenizing Search Results from Multiple Methods

To merge and homogenize search results from multiple methods into a single result set, follow these steps:

Collect Results: Gather search results from different methods (e.g., web search, image search, news search).
Normalize Scores: Each method might have its own scoring system. Normalize these scores to a common scale, such as 0 to 1.
Rank Aggregation: Use rank aggregation techniques to combine the normalized scores. Common methods include:

Borda Count: Assign points based on the rank position in each method and sum these points.
Condorcet Method: Compare each pair of results and determine which one is preferred more often.
Weighted Average: Assign weights to each method based on their reliability or relevance and calculate a weighted average score.

Remove Duplicates: Identify and remove duplicate results. Ensure that each unique result is only listed once.
Sort and Present: Sort the results based on the aggregated scores and present them in a unified list.

7.15 Handling Multi-Hop/Multifaceted Queries

To effectively handle multi-hop or multifaceted queries, follow these steps:

Decompose the Query: Break down the complex query into smaller, manageable sub-queries or steps.
Identify Dependencies: Determine if there are dependencies between the sub-queries. Some steps may need to be completed before others.
Sequential Processing: Address each sub-query in sequence, ensuring that the results of one step feed into the next as needed.
Parallel Processing: If sub-queries are independent, process them in parallel to save time.
Aggregate Results: Combine the results from the sub-queries to form a comprehensive answer to the original query.
Iterative Refinement: Review the aggregated results and refine them if necessary to ensure they fully address the original query.

7.16 Techniques to Improve Retrieval

To enhance the effectiveness of information retrieval, consider employing the following techniques:

Query Expansion:

Use synonyms, related terms, and variations of the original query to broaden the search scope. Implement automatic query expansion based on user behavior and search context.

Relevance Feedback:

Incorporate user feedback to refine search results. Use techniques like Rocchio algorithm to adjust the query vector based on relevant and non-relevant documents.

Semantic Search:

Utilize natural language processing (NLP) to understand the context and meaning behind queries. Implement knowledge graphs to connect related concepts and improve result relevance.

Personalization:

Tailor search results based on user preferences, search history, and behavior. Use machine learning models to predict and prioritize results that match user interests.

Ranking Algorithms:

Apply advanced ranking algorithms like BM25, PageRank, or learning-to-rank models to order search results by relevance. Continuously update and optimize ranking models based on user interactions and feedback.

Indexing Techniques:

Use inverted indexes, suffix trees, or other efficient data structures to speed up search operations. Implement real-time indexing to ensure the search index is always up-to-date.

Entity Recognition:

Identify and prioritize named entities (e.g., people, places, organizations) within documents to improve retrieval accuracy. Use entity linking to connect mentions of the same entity across different documents.

Contextual Search:

Incorporate contextual information such as user location, time, and device to refine search results. Use context-aware algorithms to adapt search results based on the current context.