- 
                Notifications
    You must be signed in to change notification settings 
- Fork 73
Add Bulk Ai Review #1020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Closed
      
      
    
                
     Closed
            
            Add Bulk Ai Review #1020
Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    23889ec    to
    9dde8f7      
    Compare
  
    - Refactor the current code that reviews a single string to read from the database if a review for a translation already exists. - Query translations that have not yet been reviewed, perform the bulk review, and save the result in AiReviewProto as a JSON blob that will be processed using MySQL JSON queries instead of separate columns. An option to remove an existing review can be added later, but for now this can be done directly in the database.
9dde8f7    to
    b66d843      
    Compare
  
    The change is more for clarity. In practice, only untranslated strings are sent to MT, so it would have to be either concurrent MT, or something else adding the translation which is unlikely. And it would require MT to return exactly the same string.
62a58de    to
    b59174e      
    Compare
  
    - Major: use Quartz with long polling to avoid blocking threads and allow graceful continuation after restart or error - Can select model and batch mode via CLI - Add detailed reporting for job execution in CLI - Add CLI option to attach to an existing job. The CLI uses long polling on the pollable task, but in case of lost connection, the detach option allows re-attaching to check job completion state
- keep the table denormalized for simplicity. - fix passing down the tmTextUnitIds in batch mode
- max_token is not applicable to reasoning models and is deprecated in favor of maxCompletionToken. - Use a constant for max_token, try using null later to remove the arbitrary value. - Add a temperature helper—reasoning models require a temperature of 1.
This is not backward compatible since the options are now null instead of having the default values.
If a batch cannot be created for a locale, keep working with the other locales that were successfully created. Before, it would just run the batches for nothing, and the import logic would never be called. Show info in the console about the locales that are now skipped because nothing needs to be processed, and the ones that need to be reprocessed because they failed to start.
If the import fails due to a transient issue, we don’t want to lose all the work already done in the batches. This option allows the import to resume from where it left off.
- Review type can now be specified as a command-line argument - 4 types for now: ALL (used by frontend), description rating, source rating, glossary extraction - Output types are still tied to typed classes, but may become configurable in the future - Fixed issue with the frontend review All options (prompt, input type, output type) could eventually be passed via CLI and/or configured in the application.properties, but we’ll hold off to avoid overcomplicating for now
lots of copy/pasta from aiReview, not trivial to share code though, we'll check later
With hope of improving context for better translation. To be validated, as it could also add noise. The option --related-strings support 2 types for now: - USAGES: strings that appear together in the same source file typically, but whatever if in the usage field basically. Can be used to easily group strings from an email template - ID_PREFIX: typically the ID prefix is used in code to group strings in relevant groups. Use the prefix before the first "dot" to look up relating strings Currently limit at 10000 char for the related strings, that could be an option later.
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
No description provided.