去重
,增加标识
相关的说明和讨论
#23
Closed
jasoneri
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
样例
场景-原始
由于1和2同名,所以1下载后会被2覆盖,因为win目录路径一样,3同理
场景-去重✅
选择1后得md5('jm'+'comic1')=md5_1,查表md5_1不存在,下载,产生目录
储存目录.../满开开花
再次下载1时查表发现md5_1已存在,不下载
选择2后得md5('jm'+'comic2')=md5_2,查表md5_2不存在,下载,记录进表并将内容覆盖到
储存目录.../满开开花
选择3后得md5('wnacg'+'comic1')=md5_3,查表md5_3不存在,下载,记录进表并将内容覆盖到
储存目录.../满开开花
场景-增加标识❌
无论去重还是不去重,目录存在就覆盖
场景-增加标识✅
将spider_name加唯一作品id加进命名尾部,例如下载上述三个得
储存目录.../满开开花[jm-comic1]
储存目录.../满开开花[jm-comic2]
储存目录.../满开开花[wnacg-comic1]
其他
1. id实则自定义
comic1等id仅为示例,实际可自定义
例如md5('kaobei'+福利莲+第一话)=id就可去做常规漫的去重,计划事项上常规漫的任务细化就需要此id
2. 网站将同一内容的作品从url转移到url2
考虑此情况实则并不常见,这种下重了也没所谓,少数情况
Beta Was this translation helpful? Give feedback.
All reactions