Skip to content

feat(gen ai): showcase different options for computation-based metric #12756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion generative_ai/evaluation/get_rouge_score.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ def get_rouge_score() -> EvalResult:
import pandas as pd

import vertexai

from vertexai.generative_models import GenerativeModel
from vertexai.preview.evaluation import EvalTask

# TODO(developer): Update & uncomment line below
Expand All @@ -37,7 +39,37 @@ def get_rouge_score() -> EvalResult:
life, including endangered species, it faces serious threats from
climate change, ocean acidification, and coral bleaching."""

# Compare pre-generated model responses against the reference (ground truth).
# Option1: Run model inference and evaluate model response against the reference (ground truth)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code samples looks too big now!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I understand

Copy link
Member Author

@Valeriy-Burlaka Valeriy-Burlaka Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msampathkumar , I'm thinking about showcasing 2 different options of using the computation-based metrics — Bring-your-own-response (BYOR) and with running model inference.
The reason is that for me, as a developer, the line between these options wasn't immediately obvious (hence this issue with the "prompt" column being silently unused), so I want to make it crystal-clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand your point, this code samples is still too big(100 lines). Let me check with the tech writing team.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note, I don't see any example response section for this part of the code.

model = GenerativeModel(model_name="gemini-1.5-flash-002")
eval_dataset = pd.DataFrame(
{
"prompt": [
"""Summarize the following text:

The Great Barrier Reef, located off the coast of Queensland in northeastern
Australia, is the world's largest coral reef system. Stretching over 2,300
kilometers, it is composed of over 2,900 individual reefs and 900 islands.
The reef is home to a wide variety of marine life, including many endangered
species. However, climate change, ocean acidification, and coral bleaching
pose significant threats to its ecosystem."""
],
"reference": [reference_summarization],
}
)
# Check the API reference for more details and examples:
# https://cloud.google.com/vertex-ai/generative-ai/docs/reference/python/latest/vertexai.evaluation.EvalTask
eval_task = EvalTask(
dataset=eval_dataset,
metrics=[
"rouge_1",
"rouge_2",
"rouge_l",
"rouge_l_sum",
],
)
result = eval_task.evaluate(model=model)

# Option2: Bring-your-own-response (BYOR): use pre-generated model responses for evaluation
eval_dataset = pd.DataFrame(
{
"response": [
Expand Down