Skip to content

Update to use only summary bars for uploads when in notebooks #3243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 35 additions & 10 deletions src/huggingface_hub/utils/_xet_progress_reporting.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,29 @@


class XetProgressReporter:
def __init__(self, n_lines: int = 10, description_width: int = 40):
"""
Reports on progress for Xet uploads.

If per_file_progress is True, then per-file progress is shown in a scrolling list. Otherwise,
only the summary bars showing file processing progress and data upload are shown. By default,
the summary version is shown in notebooks and guis and the detailed file progress is shown in consoles.
Comment on lines +13 to +15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If per_file_progress is True, then per-file progress is shown in a scrolling list. Otherwise,
only the summary bars showing file processing progress and data upload are shown. By default,
the summary version is shown in notebooks and guis and the detailed file progress is shown in consoles.
Shows summary progress bars when running in notebooks or GUIs, and detailed per-file progress in console environments.

Rephrase for when per_file_progress is not an option anymore

"""

def __init__(self, n_lines: int = 10, description_width: int = 30, per_file_progress=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, n_lines: int = 10, description_width: int = 30, per_file_progress=None):
def __init__(self, n_lines: int = 10, description_width: int = 30):

Let's remove the argument here since it's not used anywhere. Given that XetProgressReporter is used only internally, we don't need flexibility

self.n_lines = n_lines
self.description_width = description_width

if per_file_progress is None:
self.per_file_progress = tqdm.in_console()
else:
self.per_file_progress = per_file_progress
Comment on lines +22 to +25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if per_file_progress is None:
self.per_file_progress = tqdm.in_console()
else:
self.per_file_progress = per_file_progress
self.per_file_progress = is_google_colab() or not is_notebook()

(given the above)


self.tqdm_settings = {
"unit": "B",
"unit_scale": True,
"leave": True,
"unit_divisor": 1000,
"nrows": n_lines + 3,
"nrows": n_lines + 3 if self.per_file_progress else 3,
"miniters": 1,
"bar_format": "{l_bar}{bar}| {n_fmt:>5}B / {total_fmt:>5}B{postfix:>12}",
}
Expand All @@ -40,8 +53,13 @@ def __init__(self, n_lines: int = 10, description_width: int = 40):
def format_desc(self, name: str, indent: bool) -> str:
"""
if name is longer than width characters, prints ... at the start and then the last width-3 characters of the name, otherwise
the whole name right justified into 20 characters. Also adds some padding.
the whole name right justified into description_width characters. Also adds some padding.
"""

if not self.per_file_progress:
# Here we just use the defaults.
return name

padding = " " if indent else ""
width = self.description_width - len(padding)

Expand Down Expand Up @@ -74,6 +92,10 @@ def update_progress(self, total_update: PyTotalProgressUpdate, item_updates: Lis
self.completed_items.add(name)
new_completed.append(name)

# If we're only showing summary information, then don't update the individual bars
if not self.per_file_progress:
continue

# If we've run out of bars to use, then collapse the last ones together.
if bar_idx >= len(self.current_bars):
bar = self.current_bars[-1]
Expand Down Expand Up @@ -111,10 +133,11 @@ def update_progress(self, total_update: PyTotalProgressUpdate, item_updates: Lis

del self.item_state[name]

# Now manually refresh each of the bars
for bar in self.current_bars:
if bar:
bar.refresh()
if self.per_file_progress:
# Now manually refresh each of the bars
for bar in self.current_bars:
if bar:
bar.refresh()

# Update overall bars
def postfix(speed):
Expand All @@ -136,6 +159,8 @@ def postfix(speed):
def close(self, _success):
self.data_processing_bar.close()
self.upload_bar.close()
for bar in self.current_bars:
if bar:
bar.close()

if self.per_file_progress:
for bar in self.current_bars:
if bar:
bar.close()
9 changes: 9 additions & 0 deletions src/huggingface_hub/utils/tqdm.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@
from typing import ContextManager, Dict, Iterator, Optional, Union

from tqdm.auto import tqdm as old_tqdm
from tqdm.std import tqdm as std_tqdm

from ..constants import HF_HUB_DISABLE_PROGRESS_BARS

Expand Down Expand Up @@ -232,6 +233,14 @@ def __delattr__(self, attr: str) -> None:
if attr != "_lock":
raise

@classmethod
def in_console(cls) -> bool:
"""Returns true if running in a standard console environment and false if running in a notebook or gui."""

# Returns true if the current display method is the one in the standard tqdm class, or false if it's been
# overwritten by the gui, notebook, keras, etc. subclassing it.
return cls.display is std_tqdm.display

Comment on lines +236 to +243
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this method and use a combination of is_notebook and is_google_colab.

  • if is_google_colab => per-file progress (as colab is quite good with this)
  • else if is_notebook => summary only
  • else => per-file progress

(you can import them like this from huggingface_hub.utils import is_google_colab)


@contextmanager
def tqdm_stream_file(path: Union[Path, str]) -> Iterator[io.BufferedReader]:
Expand Down
Loading