Skip to content
This repository was archived by the owner on Feb 19, 2021. It is now read-only.

Commit 3b72d38

Browse files
authored
Merge pull request #254 from danielquinn/mcronce-disable_encryption
Allow encryption to be disabled
2 parents d5876cc + 631d316 commit 3b72d38

22 files changed

+391
-115
lines changed

.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,8 @@ target/
5959

6060
# Stored PDFs
6161
media/documents/*.gpg
62-
media/documents/thumbnails/*.gpg
63-
media/documents/originals/*.gpg
62+
media/documents/thumbnails/*
63+
media/documents/originals/*
6464

6565
# Sqlite database
6666
db.sqlite3

docker-compose.env.example

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# Environment variables to set for Paperless
22
# Commented out variables will be replaced by a default within Paperless.
33

4-
# Passphrase Paperless uses to encrypt and decrypt your documents
5-
PAPERLESS_PASSPHRASE=CHANGE_ME
4+
# Passphrase Paperless uses to encrypt and decrypt your documents, if you want
5+
# encryption at all.
6+
# PAPERLESS_PASSPHRASE=CHANGE_ME
67

78
# The amount of threads to use for text recognition
89
# PAPERLESS_OCR_THREADS=4

docs/changelog.rst

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,35 @@
11
Changelog
22
#########
33

4+
2.0.0
5+
=====
6+
7+
This is a big release as we've changed a core-functionality of Paperless: we no
8+
longer encrypt files with GPG by default.
9+
10+
The reasons for this are many, but it boils down to that the encryption wasn't
11+
really all that useful, as files on-disk were still accessible so long as you
12+
had the key, and the key was most typically stored in the config file. In
13+
other words, your files are only as safe as the ``paperless`` user is. In
14+
addition to that, *the contents of the documents were never encrypted*, so
15+
important numbers etc. were always accessible simply by querying the database.
16+
Still, it was better than nothing, but the consensus from users appears to be
17+
that it was more an annoyance than anything else, so this feature is now turned
18+
off unless you explicitly set a passphrase in your config file.
19+
20+
Migrating from 1.x
21+
------------------
22+
23+
Encryption isn't gone, it's just off for new users. So long as you have
24+
``PAPERLESS_PASSPHRASE`` set in your config or your environment, Paperless
25+
should continue to operate as it always has. If however, you want to drop
26+
encryption too, you only need to do two things:
27+
28+
1. Run ``./manage.py migrate && ./manage.py change_storage_type gpg unencrypted``.
29+
This will go through your entire database and Decrypt All The Things.
30+
2. Remove ``PAPERLESS_PASSPHRASE`` from your ``paperless.conf`` file, or simply
31+
stop declaring it in your environment.
32+
433
1.4.0
534
=====
635

docs/consumption.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ The primary method of getting documents into your database is by putting them in
1717
the consumption directory. The ``document_consumer`` script runs in an infinite
1818
loop looking for new additions to this directory and when it finds them, it goes
1919
about the process of parsing them with the OCR, indexing what it finds, and
20-
encrypting the PDF, storing it in the media directory.
20+
encrypting the PDF (if ``PAPERLESS_PASSPHRASE`` is set), storing it in the
21+
media directory.
2122

2223
Getting stuff into this directory is up to you. If you're running Paperless
2324
on your local computer, you might just want to drag and drop files there, but if

docs/migrating.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Backing Up
1616
----------
1717

1818
So you're bored of this whole project, or you want to make a remote backup of
19-
the unencrypted files for whatever reason. This is easy to do, simply use the
19+
your files for whatever reason. This is easy to do, simply use the
2020
:ref:`exporter <utilities-exporter>` to dump your documents and database out
2121
into an arbitrary directory.
2222

docs/setup.rst

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -63,17 +63,18 @@ Standard (Bare Metal)
6363

6464
1. Install the requirements as per the :ref:`requirements <requirements>` page.
6565
2. Within the extract of master.zip go to the ``src`` directory.
66-
3. Copy ``../paperless.conf.example`` to ``/etc/paperless.conf`` also the virtual
67-
envrionment look there for it and open it in your favourite editor.
68-
Because this file contains passwords it should only be readable by user root
69-
and paperless ! Set the values for:
66+
3. Copy ``../paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
67+
your favourite editor. Because this file contains passwords it should only
68+
be readable by user root and paperless! Set the values for:
7069

7170
* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
7271
dumped to be consumed by Paperless.
73-
* ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
74-
encrypt/decrypt the original document.
7572
* ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
7673
will spawn to process document pages in parallel.
74+
* ``PAPERLESS_PASSPHRASE``: this is only required if you want to use GPG to
75+
encrypt your document files. This is the passphrase Paperless uses to
76+
encrypt/decrypt the original documents. Don't worry about defining this
77+
if you don't want to use encryption (the default).
7778

7879
4. Initialise the SQLite database with ``./manage.py migrate``.
7980
5. Create a user for your Paperless instance with
@@ -139,7 +140,8 @@ Docker Method
139140

140141
``PAPERLESS_PASSPHRASE``
141142
This is the passphrase Paperless uses to encrypt/decrypt the original
142-
document.
143+
document. If you aren't planning on using GPG encryption, you can just
144+
leave this undefined.
143145

144146
``PAPERLESS_OCR_THREADS``
145147
This is the number of threads the OCR process will spawn to process
@@ -265,10 +267,11 @@ Vagrant Method
265267
3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
266268
``/etc/paperless.conf`` and set the values for:
267269

268-
* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
270+
* ``PAPERLESS_CONSUMPTION_DIR``: This is where your documents will be
269271
dumped to be consumed by Paperless.
270-
* ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
271-
encrypt/decrypt the original document.
272+
* ``PAPERLESS_PASSPHRASE``: This is the passphrase Paperless uses to
273+
encrypt/decrypt the original document. It's only required if you want
274+
your original files to be encrypted, otherwise, just leave it unset.
272275
* ``PAPERLESS_EMAIL_SECRET``: this is the "magic word" used when consuming
273276
documents from mail or via the API. If you don't use either, leaving it
274277
blank is just fine.

docs/utilities.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,8 @@ for documents to parse and index. The process is pretty straightforward:
5959
4. Attempt to automatically assign document attributes by doing some guesswork.
6060
Read up on the :ref:`guesswork documentation<guesswork>` for more
6161
information about this process.
62-
5. Encrypt the document and store it in the ``media`` directory under
63-
``documents/originals``.
62+
5. Encrypt the document (if you have a passphrase set) and store it in the
63+
``media`` directory under ``documents/originals``.
6464
6. Go to #1.
6565

6666

paperless.conf.example

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -59,19 +59,19 @@ PAPERLESS_EMAIL_SECRET=""
5959
#### Security ####
6060
###############################################################################
6161

62-
# You must have a passphrase in order for Paperless to work at all. If you set
63-
# this to "", GNUGPG will "encrypt" your PDF by writing it out as a zero-byte
64-
# file.
65-
#
66-
# The passphrase you use here will be used when storing your documents in
67-
# Paperless, but you can always export them in an unencrypted format by using
68-
# document exporter. See the documentation for more information.
62+
# Paperless can be instructed to attempt to encrypt your PDF files with GPG
63+
# using the PAPERLESS_PASSPHRASE specified below. If however you're not
64+
# concerned about encrypting these files (for example if you have disk
65+
# encryption locally) then you don't need this and can safely leave this value
66+
# un-set.
6967
#
7068
# One final note about the passphrase. Once you've consumed a document with
7169
# one passphrase, DON'T CHANGE IT. Paperless assumes this to be a constant and
7270
# can't properly export documents that were encrypted with an old passphrase if
7371
# you've since changed it to a new one.
74-
PAPERLESS_PASSPHRASE="secret"
72+
#
73+
# The default is to not use encryption at all.
74+
#PAPERLESS_PASSPHRASE="secret"
7575

7676

7777
# The secret key has a default that should be fine so long as you're hosting

src/documents/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .checks import changed_password_check

src/documents/checks.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import textwrap
2+
3+
from django.conf import settings
4+
from django.core.checks import Error, register
5+
from django.db.utils import OperationalError
6+
7+
8+
@register()
9+
def changed_password_check(app_configs, **kwargs):
10+
11+
from documents.models import Document
12+
from paperless.db import GnuPG
13+
14+
try:
15+
encrypted_doc = Document.objects.filter(
16+
storage_type=Document.STORAGE_TYPE_GPG).first()
17+
except OperationalError:
18+
return [] # No documents table yet
19+
20+
if encrypted_doc:
21+
22+
if not settings.PASSPHRASE:
23+
return [Error(
24+
"The database contains encrypted documents but no password "
25+
"is set."
26+
)]
27+
28+
if not GnuPG.decrypted(encrypted_doc.source_file):
29+
return [Error(textwrap.dedent(
30+
"""
31+
The current password doesn't match the password of the
32+
existing documents.
33+
34+
If you intend to change your password, you must first export
35+
all of the old documents, start fresh with the new password
36+
and then re-import them."
37+
"""))]
38+
39+
return []

src/documents/consumer.py

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ class Consumer:
2929
Loop over every file found in CONSUMPTION_DIR and:
3030
1. Convert it to a greyscale pnm
3131
2. Use tesseract on the pnm
32-
3. Encrypt and store the document in the MEDIA_ROOT
32+
3. Store the document in the MEDIA_ROOT with optional encryption
3333
4. Store the OCR'd text in the database
3434
5. Delete the document and image(s)
3535
"""
@@ -50,6 +50,10 @@ def __init__(self, consume=settings.CONSUMPTION_DIR,
5050

5151
os.makedirs(self.scratch, exist_ok=True)
5252

53+
self.storage_type = Document.STORAGE_TYPE_UNENCRYPTED
54+
if settings.PASSPHRASE:
55+
self.storage_type = Document.STORAGE_TYPE_GPG
56+
5357
if not self.consume:
5458
raise ConsumerError(
5559
"The CONSUMPTION_DIR settings variable does not appear to be "
@@ -213,7 +217,8 @@ def _store(self, text, doc, thumbnail, date):
213217
file_type=file_info.extension,
214218
checksum=hashlib.md5(f.read()).hexdigest(),
215219
created=created,
216-
modified=created
220+
modified=created,
221+
storage_type=self.storage_type
217222
)
218223

219224
relevant_tags = set(list(Tag.match_all(text)) + list(file_info.tags))
@@ -222,22 +227,22 @@ def _store(self, text, doc, thumbnail, date):
222227
self.log("debug", "Tagging with {}".format(tag_names))
223228
document.tags.add(*relevant_tags)
224229

225-
# Encrypt and store the actual document
226-
with open(doc, "rb") as unencrypted:
227-
with open(document.source_path, "wb") as encrypted:
228-
self.log("debug", "Encrypting the document")
229-
encrypted.write(GnuPG.encrypted(unencrypted))
230-
231-
# Encrypt and store the thumbnail
232-
with open(thumbnail, "rb") as unencrypted:
233-
with open(document.thumbnail_path, "wb") as encrypted:
234-
self.log("debug", "Encrypting the thumbnail")
235-
encrypted.write(GnuPG.encrypted(unencrypted))
230+
self._write(document, doc, document.source_path)
231+
self._write(document, thumbnail, document.thumbnail_path)
236232

237233
self.log("info", "Completed")
238234

239235
return document
240236

237+
def _write(self, document, source, target):
238+
with open(source, "rb") as read_file:
239+
with open(target, "wb") as write_file:
240+
if document.storage_type == Document.STORAGE_TYPE_UNENCRYPTED:
241+
write_file.write(read_file.read())
242+
return
243+
self.log("debug", "Encrypting")
244+
write_file.write(GnuPG.encrypted(read_file))
245+
241246
def _cleanup_doc(self, doc):
242247
self.log("debug", "Deleting document {}".format(doc))
243248
os.unlink(doc)
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
import os
2+
3+
from django.conf import settings
4+
from django.core.management.base import BaseCommand, CommandError
5+
from termcolor import colored as coloured
6+
7+
from documents.models import Document
8+
from paperless.db import GnuPG
9+
10+
11+
class Command(BaseCommand):
12+
13+
help = (
14+
"This is how you migrate your stored documents from an encrypted "
15+
"state to an unencrypted one (or vice-versa)"
16+
)
17+
18+
def add_arguments(self, parser):
19+
20+
parser.add_argument(
21+
"from",
22+
choices=("gpg", "unencrypted"),
23+
help="The state you want to change your documents from"
24+
)
25+
parser.add_argument(
26+
"to",
27+
choices=("gpg", "unencrypted"),
28+
help="The state you want to change your documents to"
29+
)
30+
parser.add_argument(
31+
"--passphrase",
32+
help="If PAPERLESS_PASSPHRASE isn't set already, you need to "
33+
"specify it here"
34+
)
35+
36+
def handle(self, *args, **options):
37+
38+
try:
39+
print(coloured(
40+
"\n\nWARNING: This script is going to work directly on your "
41+
"document originals, so\nWARNING: you probably shouldn't run "
42+
"this unless you've got a recent backup\nWARNING: handy. It "
43+
"*should* work without a hitch, but be safe and backup your\n"
44+
"WARNING: stuff first.\n\nHit Ctrl+C to exit now, or Enter to "
45+
"continue.\n\n",
46+
"yellow",
47+
attrs=("bold",)
48+
))
49+
__ = input()
50+
except KeyboardInterrupt:
51+
return
52+
53+
if options["from"] == options["to"]:
54+
raise CommandError(
55+
'The "from" and "to" values can\'t be the same.'
56+
)
57+
58+
passphrase = options["passphrase"] or settings.PASSPHRASE
59+
if not passphrase:
60+
raise CommandError(
61+
"Passphrase not defined. Please set it with --passphrase or "
62+
"by declaring it in your environment or your config."
63+
)
64+
65+
if options["from"] == "gpg" and options["to"] == "unencrypted":
66+
self.__gpg_to_unencrypted(passphrase)
67+
elif options["from"] == "unencrypted" and options["to"] == "gpg":
68+
self.__unencrypted_to_gpg(passphrase)
69+
70+
@staticmethod
71+
def __gpg_to_unencrypted(passphrase):
72+
73+
encrypted_files = Document.objects.filter(
74+
storage_type=Document.STORAGE_TYPE_GPG)
75+
76+
for document in encrypted_files:
77+
78+
print(coloured("Decrypting {}".format(document), "green"))
79+
80+
old_paths = [document.source_path, document.thumbnail_path]
81+
raw_document = GnuPG.decrypted(document.source_file, passphrase)
82+
raw_thumb = GnuPG.decrypted(document.thumbnail_file, passphrase)
83+
84+
document.storage_type = Document.STORAGE_TYPE_UNENCRYPTED
85+
86+
with open(document.source_path, "wb") as f:
87+
f.write(raw_document)
88+
89+
with open(document.thumbnail_path, "wb") as f:
90+
f.write(raw_thumb)
91+
92+
document.save(update_fields=("storage_type",))
93+
94+
for path in old_paths:
95+
os.unlink(path)
96+
97+
@staticmethod
98+
def __unencrypted_to_gpg(passphrase):
99+
100+
unencrypted_files = Document.objects.filter(
101+
storage_type=Document.STORAGE_TYPE_UNENCRYPTED)
102+
103+
for document in unencrypted_files:
104+
105+
print(coloured("Encrypting {}".format(document), "green"))
106+
107+
old_paths = [document.source_path, document.thumbnail_path]
108+
with open(document.source_path, "rb") as raw_document:
109+
with open(document.thumbnail_path, "rb") as raw_thumb:
110+
document.storage_type = Document.STORAGE_TYPE_GPG
111+
with open(document.source_path, "wb") as f:
112+
f.write(GnuPG.encrypted(raw_document, passphrase))
113+
with open(document.thumbnail_path, "wb") as f:
114+
f.write(GnuPG.encrypted(raw_thumb, passphrase))
115+
116+
document.save(update_fields=("storage_type",))
117+
118+
for path in old_paths:
119+
os.unlink(path)

0 commit comments

Comments
 (0)