Skip to content

[Optimization] File integrity verification #29

@jien37

Description

@jien37

Background:
Currently, during the download of segments, if a network issue causes a segment to fail, the entire video download fails, requiring the command to be re-executed and restarted from the beginning. In unfortunate scenarios where a segment consistently fails to download, the video cannot be successfully retrieved.

Optimization:
After completing all segment download attempts, verify integrity and retry failed segments.
(Since network issues typically persist for a certain period, immediate retries at short intervals might still fail. This method serves as a supplement to the previously implemented approach of quickly retrying failed download segments.)

Temporary Modification:

Case hulu: ext/utils/hulu_jp.py#download_segment

            with ThreadPoolExecutor(max_workers=8) as executor:
                        ......
                        pbar.update(1)
            
            # MOD START:
            print("Starting file integrity verification......")
            for i, url in enumerate(segment_links):
                temp_path = os.path.join(base_temp_dir, f"{i:05d}.ts")
                if not os.path.exists(temp_path):
                    fetch_and_save((i, url))
                    print(f"Successfully downloaded segment {i}: {url}")
                    time.sleep(2)

            print("Completed file integrity verification.")
            # MOD END

            # 結合処理
            output_path = os.path.join(base_temp_dir, name)
            with open(output_path, 'wb') as out_file:
                for i in range(len(segment_links)):
                    temp_path = os.path.join(base_temp_dir, f"{i:05d}.ts")
                    with open(temp_path, 'rb') as f:
                        out_file.write(f.read())

Case Lemino: ext/global_func/util/download_util.py#segment_downloader#download

            with ThreadPoolExecutor(max_workers=8) as executor:
                        ......
                        pbar.update(1)
            
            # MOD START:
            print("Starting file integrity verification......")
            for i, url in enumerate(segment_links):
                temp_path = os.path.join(output_temp_directory, f"{i:05d}.ts")
                if not os.path.exists(temp_path):
                    fetch_and_save((i, url))
                    print(f"Successfully downloaded segment {i}: {url}")
                    time.sleep(2)

            print("Completed file integrity verification.")
            # MOD END

            output_path = os.path.join(output_temp_directory, output_file_name)
            with open(output_path, 'wb') as out_file:
                for i in range(len(segment_links)):
                    temp_path = os.path.join(output_temp_directory, f"{i:05d}.ts")
                    with open(temp_path, 'rb') as f:
                        out_file.write(f.read())

Failed log:

Image

Successful log:

Image

Please help review this optimization, thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    prioritypriority make to this.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions