-
Notifications
You must be signed in to change notification settings - Fork 63
feat: Durable Web Search golem:web-search API Across Different Providers in Rust #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
My last PR was closed at this link #46. I accidentally deleted the branch from GitHub, which caused the PR to close automatically. Fortunately, I had the branch saved locally, so I opened this new PR. |
web-search/brave/src/lib.rs
Outdated
|
||
let page_size = self.params.max_results.unwrap_or(10); | ||
let current_offset = *self.current_offset.borrow(); | ||
let new_offset = current_offset + page_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the brave docs the offset should be incremented by 1 to get the next page of resuts, not page_size:
https://api-dashboard.search.brave.com/app/documentation/web-search/query#WebSearchAPIQueryParameters
web-search/google/src/lib.rs
Outdated
"No more results available".to_string(), | ||
)); | ||
} | ||
*self.current_start_index.borrow_mut() = *self.current_start_index.borrow_mut() + 1_u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be incremented by num, not by 1: https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list#response
Requested access to the demo video, please grant access |
@mschuwalow, granted access for the video. Please check, and I will update changes asap |
Ohh, I understood now. That is why when we transition to live, it starts from the beginning. |
@mschuwalow Please check i updated the changes |
@mschuwalow Are we ready to merge? |
web-search/brave/src/lib.rs
Outdated
println!("[DURABILITY] unwrapped_search_session: Creating new BraveSearchSession"); | ||
LOGGING_STATE.with_borrow_mut(|state| state.init()); | ||
|
||
with_config_key(&[Self::API_KEY_ENV_VAR], Err, |keys| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: please move reading the env var into the next_page call.
Otherwise a restored session might use a different api key then a live session (as they end up reading the environment variables at different times). This might make sense, but for now we want to avoid divergence between replay as live as much as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will remove the env reading and eliminate that function. I'm going to take out unwrapped_search_session and move the env reading to next_page, which will introduce some overhead due to creating a new client and reading env each time. However, it's a necessary adjustment.
web-search/brave/src/lib.rs
Outdated
let (results, metadata) = | ||
convert_response_to_results(response, &self.params, Some(current_offset)); | ||
|
||
*self.last_metadata.borrow_mut() = metadata.clone(); | ||
|
||
if results.is_empty() { | ||
*self.has_more_results.borrow_mut() = false; | ||
return Err(SearchError::BackendError("No more results".to_string())); | ||
} | ||
|
||
if let Some(metadata) = &metadata { | ||
let api_has_more = metadata.next_page_token.is_some(); | ||
let within_limits = new_offset < 9; | ||
*self.has_more_results.borrow_mut() = api_has_more && within_limits; | ||
} else { | ||
*self.has_more_results.borrow_mut() = false; | ||
} | ||
*self.current_page.borrow_mut() += 1; | ||
results | ||
.into_iter() | ||
.next() | ||
.ok_or_else(|| SearchError::BackendError("No results returned".to_string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic that you execute after the request is sent to update the internal state will need to be reproduced in session_for_page, as otherwise the internal state of the session will be different for live and replay (which will lead to different results).
I think your life will be easier if you move all checks / calculations into the beginning of next_page if possible. That way both replay and live should behave consistently. If this is not possible, you might need to persist additional state as part of replay so you can restore the internal session state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I have a solution for this. Since we are tracing the page_count in durable layer, then we can eliminate calculations from the next_page and create a separate method in extendedtrait to update pagination state with the tracked page count from durable. In the case of replay, we will increase page_count by one so that it will start exactly where the crash happened
Okay I will update asap |
…hod across multiple web search components to streamline session management.
@mschuwalow updated the changes please take a look. |
will complete exa asap |
@mschuwalow Please check now i have addressed all the comments. |
let state = self.state.borrow(); | ||
let result = match &*state { | ||
Some(DurableSearchSessionState::Live { session, .. }) => session.get_metadata(), | ||
_ => None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case needs to be handled (we just finished replay and the user asks us for the current metadata).
with_persistence_level(PersistenceLevel::PersistNothing, || { | ||
session.next_page() | ||
}); | ||
let persisted_result = durability.persist_infallible(NoInput, result.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should you use perist here instead of persist_infallible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mschuwalow I am trying to understand the difference between persist_infallible and persist, but I cannot find any documentation on that. Could you please explain?
For now, I am reverting EXA since it does not support pagination. |
@mschuwalow Updated the changes. Please take a look. |
I decided to go with #60, as it's closer to being merged. Thank you for your work on this, I can tell you put a lot of work into it. |
@mschuwalow, what are you still asking for changes on that PR, and closed mine. Did you look at the code at least after changes? |
@mschuwalow Last time you did the same #2. My implementation was good from a maintenance perspective as well. But you still chose that PR. |
Added Tavily, Serper, Brave & Google custom search.
Demo link : https://drive.google.com/file/d/16RxMlgtNzEp9Z1vYN-JZskMXBHARhuAQ/view?usp=sharing
Note: Microsoft Bing is not added. Microsoft will discontinue the bing api support after 11th August 2025
Ref : https://www.microsoft.com/en-us/bing/apis/bing-web-search-api
claim #34
/fixes #34