Skip to content

alkaline-acid/PDFanticheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

PDFanticheck🛡️

Have you ever have such worries:

  • When uploading a pdf file, the system reviews it for sensitive content
  • The PDF file duplication check shows that the duplication rate is too high
  • Do not want the PDF file to be copied by irrelevant people

Then PDFanticheck can help you. The function of PDFanticheck is to convert the original PDF content into an image and create a new layer to overlay the randomly generated text, so that what the system reads is not the original text, but the human eye can still see the original content.

Please note ⚠️: This method is not effective for OCR-based systems, and the author encourages open and collaboration.😊

augments🛠

  • text_length:The length of the text you want to randomly generate
  • opacity_range: Transparency. If it is fully transparent, only the machine can recognize it. If it is opaque, the human eye can also see the generated text
  • font_size_range: font size
  • x_range: The x coordinate of the generated text
  • y_range: y coordinate of the generated text

PDFanticheck🛡️

你是否有过如下担忧:

  • pdf文件上传时被系统审查存在敏感内容
  • pdf文件查重显示重复率过高
  • 不希望pdf文件被无关人员复制

那么PDFanticheck可以帮助到你。PDFanticheck的功能是:将原有pdf内容转化为图片,并新建图层叠加随机生成的文本,这样系统读取到的就不是原文本了,而人眼依然可以看到原内容。

请注意⚠️:此方法对基于OCR的系统无效,并且作者更鼓励开放合作的模式😊

参数🛠

  • text_length:想要随机生成的文本长度
  • opacity_range:透明度,全透明则仅机器可识别,不透明则人眼也可看到生成的文本
  • font_size_range:字体大小
  • x_range:生成文本的x坐标
  • y_range:生成文本的y坐标

Drop a star if you find this useful!

About

Make PDF misrecognized by machines or codes but still readable by human

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages