Skip to content

Commit 1fbf4e3

Browse files
committed
Complete support for VM image extraction #16
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
1 parent 4b519f8 commit 1fbf4e3

13 files changed

+474
-114
lines changed

CHANGELOG.rst

Lines changed: 8 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,13 @@
1-
Release notes
2-
=============
1+
Changelog
2+
=========
33

4-
vNext
5-
-----
4+
v (next)
5+
--------------
66

7+
- Add support for VMDK, QCOW and VDI VM image filesystems extraction
78

8-
Version 21.1.21
9-
---------------
109

11-
- Bump dependencies and use latest typecode and binaries. This is to fix
12-
installation problems on multiple OSes.
10+
v20.10
11+
------
1312

14-
15-
Version 21.1.21
16-
---------------
17-
18-
- Add new [full] extra requires that install all the dependencies
19-
- Fix bug related to commoncode libraries loading
20-
- Improve the extra requirements
21-
- Set minimum version for dependencies
22-
- Improve documentation
23-
24-
25-
Version 21.1.15
26-
---------------
27-
28-
- Drop support for Python 2
29-
- Use the latest CommonCode and TypeCode libraries
30-
- Add azure-pipelines CI support
31-
32-
33-
Version 20.10
34-
-------------
35-
36-
- Initial release.
13+
- Initial release as a split from ScanCode toolkit

README.rst

Lines changed: 59 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,26 @@ ExtractCode
77
- keywords: archive, extraction, libarchive, 7zip, scancode-toolkit
88

99

10-
ExtractCode is a universal archive extractor. It uses behind the scenes
11-
the Python standard library, a custom ctypes binding to libarchive and
12-
the 7zip command line to extract a large number of common and
13-
less common archives and compressed files. It tries to extract things
14-
in the same way on all OSes, including auto-renaming files that would
15-
not have valid names on certain filesystems or when there are multiple
16-
copies of the same path in a given archive.
10+
ExtractCode is a universal archive extractor. It uses behind the scenes
11+
multiple tools such as:
12+
13+
- the Python standard library,
14+
- a custom ctypes binding to libarchive,
15+
- the 7zip command line
16+
- optionally libguestfs on Linux
17+
18+
With these it is possible to extract a large number of common and
19+
20+
less common archives and compressed files. ExtractCode tries to extract things
21+
in the same way on all OSes, including auto-renaming files that would not have
22+
valid names on certain filesystems or when there are multiple copies of the same
23+
path in a given archive (which is possible in a tar).
24+
1725
The extraction is driven from a "voting" system that considers the
18-
file extension(s) and name, the file type and mime type (using a ctypes
26+
file extension(s) and name, the filetype and mimetype (using a ctypes
1927
binding to libmagic) to select the most appropriate extractor or
20-
uncompressor function. It can handle multi-level archives such as tar.gz.
28+
decompressor function. It can handle multi-level archives such as tar.gz and
29+
can extract recursively nested archives.
2130

2231

2332

@@ -36,3 +45,44 @@ To clean up development environment::
3645
./configure --clean
3746

3847

48+
To run the command line tool in the activated environment::
49+
50+
./extractcode -h
51+
52+
53+
Adding support for VM images
54+
----------------------------
55+
56+
Adding support for VM images requires the manual installation of libguestfs and
57+
it Python binding. You will need to install the libguestfs tools system package.
58+
On Debian and Ubuntu::
59+
60+
sudo apt-get install libguestfs-tools
61+
62+
63+
On Ubuntu, a manual stpe is required if the kernel executable file cannot be read.
64+
This is required by guestfish and libguestfs and this is an oddity there and not on Debian.
65+
66+
Run this command as a temporary fix::
67+
68+
for k in /boot/vmlinuz-*
69+
do sudo dpkg-statoverride --add --update root root 0644 /boot/vmlinuz-$(uname -r)
70+
done
71+
72+
or::
73+
74+
sudo chmod +r /boot/vmlinuz-*,
75+
76+
77+
For a permanent fix see:
78+
79+
- https://bugs.launchpad.net/ubuntu/+source/libguestfs/+bug/1813662/comments/21
80+
81+
See also for a discussion:
82+
83+
- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/759725
84+
- https://bugzilla.redhat.com/show_bug.cgi?id=1670790
85+
- https://bugs.launchpad.net/ubuntu/+source/libguestfs/+bug/1813662
86+
87+
88+

azure-pipelines.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,18 @@ jobs:
2121
image_name: ubuntu-18.04
2222
python_versions: ['3.6', '3.7', '3.8', '3.9']
2323
test_suites:
24-
all: tmp/bin/pytest -n 2 -vvs
24+
all:
25+
- apt-get install libguestfs-tools
26+
- tmp/bin/pytest -n 2 -vvs
2527

2628
- template: etc/ci/azure-linux.yml
2729
parameters:
2830
job_name: ubuntu20_cpython
2931
image_name: ubuntu-20.04
3032
python_versions: ['3.6', '3.7', '3.8', '3.9']
3133
test_suites:
32-
all: tmp/bin/pytest -n 2 -vvs
34+
all:
35+
- tmp/bin/pytest -n 2 -vvs
3336

3437
- template: etc/ci/azure-mac.yml
3538
parameters:

setup.cfg

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ packages = find:
3535
include_package_data = true
3636
zip_safe = false
3737
install_requires =
38+
attrs >= 18.1, !=20.1.0
3839
commoncode >= 21.1.21
3940
plugincode >= 21.1.21
4041
typecode >= 21.2.23

src/extractcode/archive.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -221,10 +221,11 @@ def get_handlers(location):
221221
extension_matched = exts and location.lower().endswith(exts)
222222

223223
if TRACE_DEEP:
224-
logger.debug(' get_handlers: matched type: %(type_matched)s, mime: %(mime_matched)s, ext: %(extension_matched)s' % locals())
224+
print(f' get_handlers: matched type: {type_matched}, mime: {mime_matched}, ext: {extension_matched}' % locals())
225225

226-
if handler.strict and not all([type_matched, mime_matched, extension_matched]):
227-
logger.debug(' get_handlers: skip strict' % locals())
226+
if handler.strict and not (type_matched and mime_matched and extension_matched):
227+
if TRACE_DEEP:
228+
print(f' get_handlers: skip strict: {handler.name}')
228229
continue
229230

230231
if type_matched or mime_matched or extension_matched:
@@ -1052,8 +1053,8 @@ def try_to_extract(location, target_dir, extractor):
10521053
mimetypes=('application/octet-stream',),
10531054
extensions=('.qcow2',),
10541055
kind=file_system,
1055-
extractors=[extract_vm_image, extract_tar],
1056-
strict=False,
1056+
extractors=[extract_vm_image],
1057+
strict=True,
10571058
)
10581059

10591060
VMDKHandler = Handler(
@@ -1062,7 +1063,7 @@ def try_to_extract(location, target_dir, extractor):
10621063
mimetypes=('application/octet-stream',),
10631064
extensions=('.vmdk',),
10641065
kind=file_system,
1065-
extractors=[extract_vm_image, extract_tar],
1066+
extractors=[extract_vm_image],
10661067
strict=True,
10671068
)
10681069

@@ -1072,7 +1073,7 @@ def try_to_extract(location, target_dir, extractor):
10721073
mimetypes=('application/octet-stream',),
10731074
extensions=('.vdi',),
10741075
kind=file_system,
1075-
extractors=[extract_vm_image, extract_tar],
1076+
extractors=[extract_vm_image],
10761077
strict=True,
10771078
)
10781079

src/extractcode/extract.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
from commoncode import fileutils
3131
from commoncode import ignore
3232

33-
import extractcode
33+
import extractcode # NOQA
3434
import extractcode.archive
3535

3636
logger = logging.getLogger(__name__)
@@ -61,7 +61,7 @@
6161
6262
- Symlinks may be replaced by plain file copies as if they were regular files.
6363
Hardlinks may be recreated as regular files, not as hardlinks to the original
64-
file.
64+
files.
6565
6666
- Files and directories may be renamed when their name is a duplicate. And a
6767
name may be considered a duplicate ignore upper and lower case mixes even

0 commit comments

Comments
 (0)