diff --git a/modules/imgaug/CMakeLists.txt b/modules/imgaug/CMakeLists.txt new file mode 100644 index 00000000000..7f3e19b6690 --- /dev/null +++ b/modules/imgaug/CMakeLists.txt @@ -0,0 +1,2 @@ +set(the_description "Data Augmentation Module") +ocv_define_module(imgaug opencv_imgproc opencv_core opencv_imgcodecs opencv_highgui WRAP python) diff --git a/modules/imgaug/LICENSE b/modules/imgaug/LICENSE new file mode 100644 index 00000000000..d6456956733 --- /dev/null +++ b/modules/imgaug/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/modules/imgaug/include/opencv2/imgaug.hpp b/modules/imgaug/include/opencv2/imgaug.hpp new file mode 100644 index 00000000000..0781e57aa46 --- /dev/null +++ b/modules/imgaug/include/opencv2/imgaug.hpp @@ -0,0 +1,19 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#ifndef OPENCV_IMGAUG_HPP +#define OPENCV_IMGAUG_HPP + +#include "opencv2/imgaug/transforms.hpp" +#include "opencv2/imgaug/transforms_det.hpp" +#include "opencv2/imgaug/functional.hpp" +#include "opencv2/imgaug/rng.hpp" + +/** @defgroup imgaug Data Augmentation Module for Efficient Data Preprocessing + * @{ + * @defgroup det Data Augmentation for Object Detection + * @} +*/ + + +#endif \ No newline at end of file diff --git a/modules/imgaug/include/opencv2/imgaug/functional.hpp b/modules/imgaug/include/opencv2/imgaug/functional.hpp new file mode 100644 index 00000000000..8902b5e0c5a --- /dev/null +++ b/modules/imgaug/include/opencv2/imgaug/functional.hpp @@ -0,0 +1,46 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#ifndef OPENCV_AUG_FUNCTIONAL_HPP +#define OPENCV_AUG_FUNCTIONAL_HPP +#include +#include + +namespace cv { + //! @addtogroup imgaug + //! @{ + + /** @brief Adjust the brightness of the given image. + * + * @param img Source image. This operation is inplace. + * @param brightness_factor brightness factor which controls the brightness of the adjusted image. + * Brightness factor should be >= 0. When brightness factor is larger than 1, the output image will be brighter than original. + * When brightness factor is less than 1, the output image will be darker than original. + */ + void adjustBrightness(Mat& img, double brightness_factor); + + /** @brief Adjust the contrast of the given image. + * + * @param img Source image. This operation is inplace. + * @param contrast_factor contrast factor should be larger than 1. It controls the contrast of the adjusted image. + */ + void adjustContrast(Mat& img, double contrast_factor); + + /** @brief Adjust the saturation of the given image. + * + * @param img Source image. This operation is inplace. + * @param saturation_factor saturation factor should be larger than 1. It controls the saturation of the adjusted image. + */ + void adjustSaturation(Mat& img, double saturation_factor); + + /** @brief Adjust the hue of the given image. + * + * @param img Source image. This operation is inplace. + * @param hue_factor hue factor should be in range [-1, 1]. It controls the hue of the adjusted image. + */ + void adjustHue(Mat& img, double hue_factor); + + //! @} +}; + +#endif \ No newline at end of file diff --git a/modules/imgaug/include/opencv2/imgaug/rng.hpp b/modules/imgaug/include/opencv2/imgaug/rng.hpp new file mode 100644 index 00000000000..7f283298218 --- /dev/null +++ b/modules/imgaug/include/opencv2/imgaug/rng.hpp @@ -0,0 +1,35 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#ifndef OPENCV_AUG_RNG_HPP +#define OPENCV_AUG_RNG_HPP + + + +namespace cv{ + + namespace imgaug{ + //! @addtogroup imgaug + //! @{ + + //! Initial state of the random number generator cv::imgaug::rng. If you don't manually set it using cv::imgaug::setSeed, + //! it will be set to the current tick count returned by cv::getTickCount. + extern uint64 state; + + //! Random number generator for data augmentation module + extern cv::RNG rng; + + /** @brief Manually set the initial state of the random number generator cv::imgaug::rng. + * + * @param seed The seed value needed to generate a random number. + */ + CV_EXPORTS_W void setSeed(uint64 seed); + + //! @} + } +} + + + + +#endif //OPENCV_AUG_RNG_HPP diff --git a/modules/imgaug/include/opencv2/imgaug/transforms.hpp b/modules/imgaug/include/opencv2/imgaug/transforms.hpp new file mode 100644 index 00000000000..269e2f2bf51 --- /dev/null +++ b/modules/imgaug/include/opencv2/imgaug/transforms.hpp @@ -0,0 +1,432 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#ifndef OPENCV_AUG_TRANSFORMS_HPP +#define OPENCV_AUG_TRANSFORMS_HPP + +#include +#include +#include + + +namespace cv{ + //! Data augmentation module + namespace imgaug{ + + //! @addtogroup imgaug + //! @{ + + //! Base class for all data augmentation classes. + class CV_EXPORTS_W Transform{ + public: + CV_WRAP virtual void call(InputArray src, OutputArray dst) const = 0; + CV_WRAP virtual ~Transform() = default; + }; + + //! Combine a series of data augmentation methods into one and apply them sequentially. + class CV_EXPORTS_W Compose{ + public: + /** @brief Initialize the Compose class by passing a series of data augmentation you want to apply. + * + * @param transforms Series of data augmentation methods. All data augmentation classes should inherited from cv::imgaug::Transform. + */ + CV_WRAP explicit Compose(std::vector >& transforms); + /** @brief Call composed data augmentation methods, apply them to the input image sequentially. + * + * @param src Source image. + * @param dst Destination image. + * + * @note Some data augmentation methods only support images in certain formats. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const; + + //! vector of the pointers to the data augmentation instances. + std::vector > transforms; + }; + + //! Crop the given image at a random location + class CV_EXPORTS_W RandomCrop: public Transform{ + public: + /** @brief Initialize the RandomCrop class. + * + * @param sz Size of the cropped image. + * @param padding Padding on the borders of the source image. Four element tuple needs to be provided, + * which is the padding for the top, bottom, left and right respectively. By default no padding is added. + * @param pad_if_need When the cropped size is smaller than the source image (with padding), exception will raise. + * Set this value to true to automatically pad the image to avoid this exception. + * @param fill Fill value of the padded pixels. By default is 0. + * @param padding_mode Type of padding. Default is #BORDER_CONSTANT, see #BorderTypes for details. + */ + CV_WRAP explicit RandomCrop(const Size& sz, const Vec4i& padding=Vec4i(0,0,0,0), bool pad_if_need=false, int fill=0, int padding_mode=BORDER_CONSTANT); + + CV_WRAP ~RandomCrop() override = default; + + /** @brief Apply augmentation method on source image, this operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Size sz; + Vec4i padding; + bool pad_if_need; + int fill; + int padding_mode; + }; + + //! Flip the image randomly along specified axes. + class CV_EXPORTS_W RandomFlip: public Transform{ + public: + /** Initialize the RandomFlip class. + * + * @param flipCode flipCode to specify the axis along which image is flipped. Set + * 0 for vertical axis, positive for horizontal axis, negative for both axes. + * \f[\texttt{dst} _{ij} = + \left\{ + \begin{array}{l l} + \texttt{src} _{\texttt{src.rows}-i-1,j} & if\; \texttt{flipCode} = 0 \\ + \texttt{src} _{i, \texttt{src.cols} -j-1} & if\; \texttt{flipCode} > 0 \\ + \texttt{src} _{ \texttt{src.rows} -i-1, \texttt{src.cols} -j-1} & if\; \texttt{flipCode} < 0 \\ + \end{array} + \right.\f] + * @param p Probability to apply this method. p should be in range 0 to 1, larger p denotes higher probability. + */ + CV_WRAP explicit RandomFlip(int flipCode=0, double p=0.5); + + CV_WRAP ~RandomFlip() override = default; + + /** @brief Apply augmentation method on source image, this operation is inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + int flipCode; + double p; + }; + + //! Resize the image to specified size + class CV_EXPORTS_W Resize: public Transform{ + public: + /** @brief Initialize the Resize class. + * + * @param sz Size of the resized image. + * @param interpolation Interpolation mode. Refer to #InterpolationFlags for more details. + */ + CV_WRAP explicit Resize(const Size& sz, int interpolation=INTER_LINEAR); + + CV_WRAP ~Resize() override = default; + + /** @brief Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Size sz; + int interpolation; + }; + + //! Crop the given image at the center + class CV_EXPORTS_W CenterCrop : public Transform { + public: + /** @brief Initialize the CenterCrop class. + * + * @param size Size of the cropped image. + */ + CV_WRAP explicit CenterCrop(const Size& size); + + CV_WRAP ~CenterCrop() override = default; + + /** @brief Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Size size; + }; + + //! Pad the given image on the borders. + class CV_EXPORTS_W Pad : public Transform{ + public: + /** Initialize the Pad class. + * + * @param padding Padding on the borders of the source image. Four-elements tuple needs to be provided, + * which is the padding for the top, bottom, left and right respectively. + * @param fill Fill value of the padded pixels. By default fill value is 0 for all channels. + * @param padding_mode Type of padding. Default is #BORDER_CONSTANT, see #BorderTypes for details. + */ + CV_WRAP explicit Pad(const Vec4i& padding, const Scalar& fill = Scalar(), int padding_mode = BORDER_CONSTANT); + + CV_WRAP ~Pad() override = default; + + /** @brief Apply augmentation method on source image. This operation is inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Vec4i padding; + const Scalar fill; + int padding_mode; + }; + + //! Crop a random portion of image and resize it to a given size. + class CV_EXPORTS_W RandomResizedCrop : public Transform { + public: + /** @brief Initialize the RandomResizedCrop class. + * + * @param size Expected output size of the destination image. + * @param scale Specify the the lower and upper bounds for the random area of the crop, + before resizing. The scale is defined with respect to the area of the original image. + * @param ratio lower and upper bounds for the random aspect ratio of the crop, before + resizing. + * @param interpolation Interpolation mode. Refer to #InterpolationFlags for more details. + */ + CV_WRAP explicit RandomResizedCrop(const Size& size, const Vec2d& scale = Vec2d(0.08, 1.0), const Vec2d& ratio = Vec2d(3.0 / 4.0, 4.0 / 3.0), int interpolation = INTER_LINEAR); + + /** @brief Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Size size; + Vec2d scale; + Vec2d ratio; + int interpolation; + }; + + //! Change the brightness, contrast, saturation and hue of the given image randomly. The activated functions are applied in random order. + class CV_EXPORTS_W ColorJitter : public Transform { + public: + /** Initialize the ColorJitter class. + * + * @param brightness Specify the lower and upper bounds for the brightness factor. + * Brightness factor is >= 0. When brightness factor is 1, the brightness of the augmented image will not be changed. + * When brightness factor is larger, the augmented image is brighter. + * By default this function is disabled. + * You can also pass cv::Vec2d() to disable this function manually. + * @param contrast Specify the lower and upper bounds for the contrast factor. + * Contrast factor is >= 0. When contrast factor is 1, the contrast of the augmented image will not be changed. + * When contrast factor is larger, the contrast of the destination image is larger. + * By default this function is disabled. You can also pass cv::Vec2d() to disable this function manually. + * @param saturation Specify the lower and upper bounds for the saturation factor. + * Saturation factor is >= 0. When saturation factor is 1, the saturation of the augmented image will not be changed. + * When saturation factor is larger, the saturation of the destination image is larger. + * By default this function is disabled. You can also pass cv::Vec2d() to disable this function manually. + * @param hue Specify the lower and upper bounds for the hue factor. + * Hue factor should be in range of -1 to 1. When hue factor is 0, the hue of the augmented image will not be changed. + * By default this function is disabled. You can also pass cv::Vec2d() to disable this function manually. + */ + CV_WRAP explicit ColorJitter(const Vec2d& brightness=Vec2d(), const Vec2d& contrast=Vec2d(), const Vec2d& saturation=Vec2d(), const Vec2d& hue=Vec2d()); + + /** Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Vec2d brightness; + Vec2d contrast; + Vec2d saturation; + Vec2d hue; + }; + + //! Rotate the given image by a random degree. + class CV_EXPORTS_W RandomRotation : public Transform { + public: + /** @brief Initialize the RandomRotation class. + * + * @param degrees Specify the lower and upper bounds for the rotation degree. + * @param interpolation Interpolation mode. Refer to #InterpolationFlags for more details. + * @param center Rotation center, origin is the left corner of the image. By default it is set to the center of the image. + * @param fill Fill value for the area outside the rotated image. Default is 0 for all channels. + */ + CV_WRAP explicit RandomRotation(const Vec2d& degrees, int interpolation=INTER_LINEAR, const Point2f& center=Point2f(), const Scalar& fill=Scalar()); + + /** Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Vec2d degrees; + int interpolation; + Point2f center; + Scalar fill; + }; + + //! Convert the image into grayscale image of specified channels. + class CV_EXPORTS_W GrayScale : public Transform { + public: + /** @brief Initialize the GrayScale class. + * + * @param num_channels number of the channels of the destination image. All channels are same. + */ + CV_WRAP explicit GrayScale(int num_channels=1); + + /** @brief Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + int num_channels; + }; + + //! Convert the given image into grayscale given a certain probability. + class CV_EXPORTS_W RandomGrayScale : public Transform { + public: + /** @brief Initialize the RandomGrayScale class. + * + * @param p Probability of turning a image into grayscale. p should be in range 0 to 1. A larger p means a higher probability. + */ + CV_WRAP explicit RandomGrayScale(double p=0.1); + + /** @brief Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + double p; + }; + + //! Randomly erase a area of the given image. + class CV_EXPORTS_W RandomErasing : public Transform { + public: + /** Initialize the RandomErasing class. + * + * @param p Probability to apply the random erasing operation. + * @param scale Range of proportion of erased area against input image. + * @param ratio Range of aspect ratio of erased area. + * @param value Fill value of the erased area. + * @param inplace If true, erase the area on the source image. + * If false, erase the area on the destination image, which will not affect the source image. + */ + CV_WRAP explicit RandomErasing(double p=0.5, const Vec2d& scale=Vec2d(0.02, 0.33), const Vec2d& ratio=Vec2d(0.3, 0.33), const Scalar& value=Scalar(0, 100, 100), bool inplace=false); + + /** @brief Apply augmentation method on source image. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + double p; + Vec2d scale; + Vec2d ratio; + Scalar value; + bool inplace; + }; + + + //! Normalize given image with mean and standard deviation. + //! The destination image will be normalized into range 0 to 1 first, + //! then the normalization operation will be applied to each channel of the image. + class CV_EXPORTS_W Normalize : public Transform { + public: + /** @brief Initialize the Normalize class. + * + * @param mean Sequence of means for each channels. + * @param std Sequence of standard deviations for each channels. + * + * @note The image read in OpenCV is of type BGR by default, you should provide the mean and std in order of [B,G,R] if the type of source image is BGR. + */ + CV_WRAP explicit Normalize(const Scalar& mean=Scalar(0,0,0,0), const Scalar& std=Scalar(1,1,1,1)); + + /** @brief Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Scalar mean; + Scalar std; + }; + + //! Blurs image with randomly chosen Gaussian blur. + class CV_EXPORTS_W GaussianBlur : public Transform { + public: + /** @brief Initialize the GaussianBlur class. + * + * @param kernel_size Size of the gaussian kernel. + * @param sigma Specify the lower and upper bounds of the standard deviation to be used for creating kernel to perform blurring. + */ + CV_WRAP explicit GaussianBlur(const Size& kernel_size, const Vec2f& sigma=Vec2f(0.1, 2.0)); + + /** @brief Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Size kernel_size; + Vec2f sigma; + }; + + //! Apply random affine transformation to the image. + class CV_EXPORTS_W RandomAffine: public Transform{ + public: + /** Initialize the RandomAffine class. + * + * @param degrees Range of rotation degrees to select from. + * @param translations Tuple of maximum absolute fraction for horizontal and vertical translations. By default translation is 0 in both directions. + * @param scales Scaling factor interval. The scale factor is sampled uniformly from the interval. By default scale factor is 1. + * @param shears Range of degrees to select from. Degree along x axis shear_x is sampled from range [shears[0], shear[1]]. Degree along y axis shear_y is sampled from range [shears[2], shear[3]]. By default, shear_x and shear_y are all 0. + * @param interpolation Interpolation mode. Refer to #InterpolationFlags for more details. + * @param fill Fill value of the area outside the transformed image. + * @param center Rotation center. Origin is the left corner of the image. By default it is set to the center of the image. + */ + CV_WRAP explicit RandomAffine(const Vec2f& degrees=Vec2f(0., 0.), const Vec2f& translations=Vec2f(0., 0.), const Vec2f& scales=Vec2f(1., 1.), const Vec4f& shears=Vec4f(0., 0., 0., 0.), int interpolation=INTER_NEAREST, const Scalar& fill=Scalar(), const Point2i& center=Point2i(-1, -1)); + + /** @brief Apply augmentation method on source image. This operation is not inplace. + * + * @param src Source image. + * @param dst Destination image. + */ + CV_WRAP void call(InputArray src, OutputArray dst) const override; + + Vec2f degrees; + Vec2f translations; + Vec2f scales; + Vec4f shears; + int interpolation; + Scalar fill; + Point2i center; + + }; + + //! @cond IGNORED + void grayScale(InputArray _src, OutputArray _dst, int num_channels); + void randomCrop(InputArray src, OutputArray dst, const Size& sz, const Vec4i& padding=Vec4i() , bool pad_if_need=false, int fill=0, int padding_mode=BORDER_CONSTANT);CV_EXPORTS_W void randomFlip(InputArray src, OutputArray dst, int flipCode=0, double p=0.5); + void centerCrop(InputArray src, OutputArray dst, const Size& size); + void randomResizedCrop(InputArray src, OutputArray dst, const Size& size, const Vec2d& scale = Vec2d(0.08, 1.0), const Vec2d& ratio = Vec2d(3.0 / 4.0, 4.0 / 3.0), int interpolation = INTER_LINEAR); + void colorJitter(InputArray src, OutputArray dst, const Vec2d& brightness=Vec2d(), const Vec2d& contrast=Vec2d(), const Vec2d& saturation=Vec2d(), const Vec2d& hue=Vec2d()); + void randomRotation(InputArray src, OutputArray dst, const Vec2d& degrees, int interpolation=INTER_LINEAR, const Point2f& center=Point2f(), const Scalar& fill=Scalar(0)); + void randomGrayScale(InputArray src, OutputArray dst, double p=0.1); + void randomErasing(InputArray src, OutputArray dst, double p=0.5, const Vec2d& scale=Vec2d(0.02, 0.33), const Vec2d& ratio=Vec2d(0.3, 0.33), const Scalar& value=Scalar(0, 100, 100), bool inplace=false); + void gaussianBlur(InputArray src, OutputArray dst, const Size& kernel_size, const Vec2f& sigma=Vec2f(0.1, 2.0)); + void randomAffine(InputArray src, OutputArray dst, const Vec2f& degrees=Vec2f(0., 0.), const Vec2f& translations=Vec2f(0., 0.), const Vec2f& scales=Vec2f(1., 1.), const Vec4f& shears=Vec4f(0., 0., 0., 0.), int interpolation=INTER_NEAREST, const Scalar& fill=Scalar(), const Point2i& center=Point2i(-1, -1)); + //! @endcond + + //! @} + + } +} + +#endif diff --git a/modules/imgaug/include/opencv2/imgaug/transforms_det.hpp b/modules/imgaug/include/opencv2/imgaug/transforms_det.hpp new file mode 100644 index 00000000000..b3c8b6bd047 --- /dev/null +++ b/modules/imgaug/include/opencv2/imgaug/transforms_det.hpp @@ -0,0 +1,237 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#ifndef OPENCV_TRANSFORMS_DET_HPP +#define OPENCV_TRANSFORMS_DET_HPP + + +namespace cv{ + namespace imgaug{ + namespace det{ + + //! @addtogroup det + //! @{ + + //! Base class for all data augmentation classes for detection task + class CV_EXPORTS_W Transform{ + public: + CV_WRAP virtual void call(InputArray src, OutputArray dst, CV_IN_OUT std::vector& bboxes, CV_IN_OUT std::vector& labels) const = 0; + CV_WRAP virtual ~Transform() = default; + }; + + //! Combine data augmentation methods into one and apply them sequentially to source image and annotation + //! All combined data augmentation class must inherited from cv::imgaug::det::Transform + class CV_EXPORTS_W Compose : public Transform{ + public: + /** @brief Initialize Compose class. + * + * @param transforms data augmentation methods used to compose + */ + CV_WRAP explicit Compose(std::vector >& transforms); + + /** @brief Apply data augmentation method on source image and its annotation. + * + * @param src Source image. + * @param dst Destination image. + * @param bboxes Annotation of source image, which consists of several bounding boxes of the detected objects in the source image. + * In Python, the bounding box is represented as a four-elements tuple (x, y, w, h), + * in which x, y is the coordinates of the left top corner of the bounding box and w, h is the width and height of the bounding box. + * @param labels Class labels of the detected objects in source image. The order of the labels should correspond to the order of the bboxes. + */ + CV_WRAP void call(InputArray src, OutputArray dst, CV_IN_OUT std::vector& bboxes, CV_IN_OUT std::vector& labels) const override; + + std::vector > transforms; + }; + + class CV_EXPORTS_W RandomFlip: public Transform{ + public: + /** @brief Initialize the RandomFlip class. + * + * @param flipCode flipCode to specify the axis along which image is flipped. Set + * 0 for vertical axis, positive for horizontal axis, negative for both axes. + * \f[\texttt{dst} _{ij} = + \left\{ + \begin{array}{l l} + \texttt{src} _{\texttt{src.rows}-i-1,j} & if\; \texttt{flipCode} = 0 \\ + \texttt{src} _{i, \texttt{src.cols} -j-1} & if\; \texttt{flipCode} > 0 \\ + \texttt{src} _{ \texttt{src.rows} -i-1, \texttt{src.cols} -j-1} & if\; \texttt{flipCode} < 0 \\ + \end{array} + \right.\f] + * @param p Probability to apply this method. p should be in range 0 to 1, larger p denotes higher probability. + */ + CV_WRAP explicit RandomFlip(int flipCode=0, float p=0.5); + + /** @brief Apply data augmentation method on source image and its annotation. + * + * @param src Source image. + * @param dst Destination image. + * @param bboxes Annotation of source image, which consists of several bounding boxes of the detected objects in the source image. + * In Python, the bounding box is represented as a four-elements tuple (x, y, w, h), + * in which x, y is the coordinates of the left top corner of the bounding box and w, h is the width and height of the bounding box. + * @param labels Class labels of the detected objects in source image. The order of the labels should correspond to the order of the bboxes. + */ + CV_WRAP void call(InputArray src, OutputArray dst, CV_IN_OUT std::vector& bboxes, std::vector& labels) const override; + + /** @brief Flip the annotated bounding boxes. + * + * @param bboxes Bounding box annotations. + * @param size The size of the source image. + */ + void flipBoundingBox(std::vector& bboxes, const Size& size) const; + + int flipCode; + float p; + }; + +// class CV_EXPORTS_W RandomCrop: cv::det::Transform{ +// public: +// CV_WRAP explicit RandomCrop(const Size& sz, const Vec4i& padding=Vec4i() , bool pad_if_need=false, const Scalar& fill=Scalar(), int padding_mode=BORDER_CONSTANT); +// CV_WRAP void call(InputArray src, OutputArray dst, std::vector& target) const; +// +// const Size sz; +// Vec4i padding; +// bool pad_if_need; +// Scalar fill; +// int padding_mode; +// }; + + + //! Resize the source image and its annotations into specified size. + class CV_EXPORTS_W Resize: public Transform{ + public: + /** @brief Initialize the Resize class + * + * @param size Size of the resized image. + * @param interpolation Interpolation mode when resize image, see #InterpolationFlags for details. + */ + CV_WRAP explicit Resize(const Size& size, int interpolation=INTER_NEAREST); + + /** @brief Apply data augmentation method on source image and its annotation. + * + * @param src Source image. + * @param dst Destination image. + * @param bboxes Annotation of source image, which consists of several bounding boxes of the detected objects in the source image. + * In Python, the bounding box is represented as a four-elements tuple (x, y, w, h), + * in which x, y is the coordinates of the left top corner of the bounding box and w, h is the width and height of the bounding box. + * @param labels Class labels of the detected objects in source image. The order of the labels should correspond to the order of the bboxes. + */ + CV_WRAP void call(InputArray src, OutputArray dst, CV_IN_OUT std::vector& bboxes, std::vector& labels) const override; + + /** @brief Resize the bounding boxes of the detected objects in the source image. + * + * @param bboxes Bounding box annotations. + * @param imgSize The size of the source image. + */ + void resizeBoundingBox(std::vector& bboxes, const Size& imgSize) const; + + const Size size; + int interpolation; + }; + + //! Convert the color space of the given image + class CV_EXPORTS_W Convert: public Transform{ + public: + /** @brief Initialize the Convert class + * + * @param code color space conversion code (see #ColorConversionCodes). + */ + CV_WRAP explicit Convert(int code); + + /** @brief Apply data augmentation method on source image and its annotation. + * + * @param src Source image. + * @param dst Destination image. + * @param bboxes Annotation of source image, which consists of several bounding boxes of the detected objects in the source image. + * In Python, the bounding box is represented as a four-elements tuple (x, y, w, h), + * in which x, y is the coordinates of the left top corner of the bounding box and w, h is the width and height of the bounding box. + * @param labels Class labels of the detected objects in source image. The order of the labels should correspond to the order of the bboxes. + */ + CV_WRAP void call(InputArray src, OutputArray dst, CV_IN_OUT std::vector& bboxes, std::vector& labels) const override; + + int code; + }; + + //! Randomly translate the given image. + //! Bounding boxes which has an area of less than the threshold in the remaining in the transformed image + //! will be filtered. + //! The resolution of the image is not changed after the transformation. The remaining area after shift is filled with 0. + class CV_EXPORTS_W RandomTranslation: public Transform{ + public: + /** @brief Initialize the RandomTranslation class + * + * @param translations Contains two elements tx and ty, representing tha maximum translation distances + * along x axis and y axis in pixels. tx and ty must be >= 0. The actual translation distances along x and y axes + * are sampled uniformly from [-tx, tx] and [-ty, ty]. + * @param threshold Bounding boxes with area in the remaining image less than threshold will be dropped. + */ + CV_WRAP explicit RandomTranslation(const Vec2i& translations, float threshold=0.25); + + /** @brief Apply data augmentation method on source image and its annotation. + * + * @param src Source image. + * @param dst Destination image. + * @param bboxes Annotation of source image, which consists of several bounding boxes of the detected objects in the source image. + * In Python, the bounding box is represented as a four-elements tuple (x, y, w, h), + * in which x, y is the coordinates of the left top corner of the bounding box and w, h is the width and height of the bounding box. + * @param labels Class labels of the detected objects in source image. The order of the labels should correspond to the order of the bboxes. + */ + CV_WRAP void call(InputArray src, OutputArray dst, CV_IN_OUT std::vector& bboxes, std::vector& labels) const override; + + /** @brief Translate bounding boxes and filter invalid bounding boxes after translation. + * + * @param bboxes Bounding box annotations. + * @param labels Class labels of the detected objects in source image. + * @param imgSize Size of the source image. + * @param tx Translation in x axis in pixel. + * @param ty Translation in y axis in pixel. + */ + CV_WRAP void translateBoundingBox(std::vector& bboxes, std::vector &labels, const Size& imgSize, int tx, int ty) const; + + Vec2i translations; + float threshold; + }; + + //! Rotate the given image and its bounding boxes by a random angle. + //! Filter invalid bounding boxes if its remaining area in the destination image is less than threshold. + //! The size of the destination image is not changed. The remaining area in the destination image is filled with 0. + class CV_EXPORTS_W RandomRotation: public Transform{ + public: + /** @brief Initialize the RandomRotation class. + * + * @param angles Intervals in which the rotation angle is uniformly sampled from. + * @param threshold Bounding boxes with area in the remaining image less than threshold will be dropped. + */ + explicit RandomRotation(const Vec2d& angles, double threshold=0.25); + + /** @brief Apply data augmentation method on source image and its annotation. + * + * @param src Source image. + * @param dst Destination image. + * @param bboxes Annotation of source image, which consists of several bounding boxes of the detected objects in the source image. + * In Python, the bounding box is represented as a four-elements tuple (x, y, w, h), + * in which x, y is the coordinates of the left top corner of the bounding box and w, h is the width and height of the bounding box. + * @param labels Class labels of the detected objects in source image. The order of the labels should correspond to the order of the bboxes. + */ + CV_WRAP void call(InputArray src, OutputArray dst, CV_IN_OUT std::vector& bboxes, std::vector& labels) const override; + + /** @brief Rotate bounding boxes and filter out invalid bounding boxes after rotation. + * + * @param bboxes Bounding box annotations. + * @param labels Class labels of the detected objects in source image. + * @param angle Rotation angle in degree. + * @param cx x coordinate of the rotation center. + * @param cy y coordinate of the rotation center. + * @param imgSize Size of the destination image, used for clamping the coordinates of bounding boxes. + */ + CV_WRAP void rotateBoundingBoxes(std::vector& bboxes, std::vector &labels, double angle, int cx, int cy, const Size& imgSize) const; + + Vec2d angles; + double threshold; + }; + + //! @} + } + } +} + +#endif //OPENCV_TRANSFORMS_DET_HPP diff --git a/modules/imgaug/misc/python/pyopencv_imgaug.hpp b/modules/imgaug/misc/python/pyopencv_imgaug.hpp new file mode 100644 index 00000000000..a829fbe3cf9 --- /dev/null +++ b/modules/imgaug/misc/python/pyopencv_imgaug.hpp @@ -0,0 +1,45 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#ifndef OPENCV_AUG_MISC_PYTHON_HPP +#define OPENCV_AUG_MISC_PYTHON_HPP +typedef std::vector > vector_Ptr_Transform; +typedef std::vector > vector_Ptr_imgaug_det_Transform; + +//template<> +//bool pyopencv_to(PyObject *o, std::vector > &value, const ArgInfo& info){ +// return pyopencv_to_generic_vec(o, value, info); +//} +template<> struct pyopencvVecConverter > +{ + static bool to(PyObject* obj, std::vector >& value, const ArgInfo& info) + { + return pyopencv_to_generic_vec(obj, value, info); + } + +}; + +template<> struct pyopencvVecConverter > +{ + static bool to(PyObject* obj, std::vector >& value, const ArgInfo& info) + { + return pyopencv_to_generic_vec(obj, value, info); + } + +}; + +template<> struct PyOpenCV_Converter +{ + static bool to(PyObject* obj, unsigned long long& value, const ArgInfo& info){ + if(!obj || obj == Py_None) + return true; + if(PyLong_Check(obj)){ + value = PyLong_AsUnsignedLongLong(obj); + }else{ + return false; + } + return value != (unsigned int)-1 || !PyErr_Occurred(); + } +}; + +#endif \ No newline at end of file diff --git a/modules/imgaug/samples/det_compose_sample.cpp b/modules/imgaug/samples/det_compose_sample.cpp new file mode 100644 index 00000000000..d5a3656879a --- /dev/null +++ b/modules/imgaug/samples/det_compose_sample.cpp @@ -0,0 +1,50 @@ +#include +#include +#include +#include +#include + +using namespace cv; + + +static void drawBoundingBoxes(Mat& img, std::vector& bboxes){ + for(cv::Rect bbox: bboxes){ + cv::Point tl {bbox.x, bbox.y}; + cv::Point br {bbox.x + bbox.width, bbox.y + bbox.height}; + cv::rectangle(img, tl, br, cv::Scalar(0, 255, 0), 2); + } +} + + +int main(){ + Mat src = imread(samples::findFile("lena.jpg"), IMREAD_COLOR); + Mat dst; + + std::vector bboxes{ + Rect{112, 40, 249, 343}, + Rect{61, 273, 113, 228} + }; + + std::vector labels {1, 2}; + + Mat ori_src; + src.copyTo(ori_src); + drawBoundingBoxes(ori_src, bboxes); + + imgaug::det::RandomRotation randomRotation(Vec2d(-30, 30)); + imgaug::det::RandomFlip randomFlip(1); + imgaug::det::Resize resize(Size(224, 224)); + + std::vector > transforms {&randomRotation, &randomFlip, &resize}; + imgaug::det::Compose aug(transforms); + + aug.call(src, dst, bboxes, labels); + + drawBoundingBoxes(dst, bboxes); + + imshow("src", ori_src); + imshow("dst", dst); + waitKey(0); + + return 0; +} \ No newline at end of file diff --git a/modules/imgaug/samples/det_sample.cpp b/modules/imgaug/samples/det_sample.cpp new file mode 100644 index 00000000000..2e553e44127 --- /dev/null +++ b/modules/imgaug/samples/det_sample.cpp @@ -0,0 +1,43 @@ +#include +#include +#include +#include + +using namespace cv; + + +static void drawBoundingBoxes(Mat& img, std::vector& bboxes){ + for(cv::Rect bbox: bboxes){ + cv::Point tl {bbox.x, bbox.y}; + cv::Point br {bbox.x + bbox.width, bbox.y + bbox.height}; + cv::rectangle(img, tl, br, cv::Scalar(0, 255, 0), 2); + } +} + + +int main(){ + Mat src = imread(samples::findFile("lena.jpg"), IMREAD_COLOR); + Mat dst; + + std::vector bboxes{ + Rect{112, 40, 249, 343}, + Rect{61, 273, 113, 228} + }; + + std::vector labels {1, 2}; + + Mat ori_src; + src.copyTo(ori_src); + drawBoundingBoxes(ori_src, bboxes); + + imgaug::det::RandomRotation aug(Vec2d(-30, 30)); + aug.call(src, dst, bboxes, labels); + + drawBoundingBoxes(dst, bboxes); + + imshow("src", ori_src); + imshow("dst", dst); + waitKey(0); + + return 0; +} \ No newline at end of file diff --git a/modules/imgaug/samples/opencv_aug_demo.py b/modules/imgaug/samples/opencv_aug_demo.py new file mode 100644 index 00000000000..a6dd8f45170 --- /dev/null +++ b/modules/imgaug/samples/opencv_aug_demo.py @@ -0,0 +1,55 @@ +import cv2 +import copy + + +def random_crop(image): + transform = cv2.imgaug.RandomCrop((300, 300)) + return transform.call(image) + + +def random_flip(image): + transform = cv2.imgaug.RandomFlip(flipCode=1, p=0.8) + return transform.call(image) + + +def center_crop(image): + transform = cv2.imgaug.CenterCrop(size=(100, 100)) + return transform.call(image) + + +def pad(image): + transform = cv2.imgaug.Pad(padding=(10, 10, 10, 10)) + return transform.call(image) + + +def random_resized_crop(image): + transform = cv2.imgaug.RandomResizedCrop(size=(100, 100)) + return transform.call(image) + + +def compose(image): + transform = cv2.imgaug.Compose([ + cv2.imgaug.Resize((1024, 1024)), + cv2.imgaug.RandomCrop((800, 800)), + cv2.imgaug.RandomFlip(), + cv2.imgaug.CenterCrop((512, 512)), + ]) + return transform.call(image) + + +def main(): + # read image + input_path = "../../../samples/data/corridor.jpg" + src = cv2.imread(input_path) + + while True: + image = copy.copy(src) + image = compose(image) + cv2.imshow("dst", image) + ch = cv2.waitKey(1000) + if ch == 27: + break + + +if __name__ == '__main__': + main() diff --git a/modules/imgaug/samples/train_cls_net.py b/modules/imgaug/samples/train_cls_net.py new file mode 100644 index 00000000000..d5944e18df6 --- /dev/null +++ b/modules/imgaug/samples/train_cls_net.py @@ -0,0 +1,99 @@ +import os +import pandas as pd +import argparse +import torch +import cv2 +from torchvision import transforms +from torchvision.models import resnet18 +from torch.utils import data +import numpy as np +import time +import tqdm + + +def get_args(): + parser = argparse.ArgumentParser() + parser.add_argument("--root", type=str, default="imagenette2-320") + parser.add_argument("--lr", type=float, default=3e-4) + + return parser.parse_args() + + +class ImagenetteDataset(torch.utils.data.Dataset): + def __init__(self, root, df_data, mode='train', transform=None): + super(ImagenetteDataset, self).__init__() + assert mode in ['train', 'valid'] + + self.root = root + self.transform = transform + labels = ['n01440764', 'n02102040', 'n02979186', 'n03000684', 'n03028079', 'n03394916', 'n03417042', 'n03425413', 'n03445777', 'n03888257'] + self.label_to_num = {v: k for k, v in enumerate(labels)} + + if mode == 'train': + self.df_data = df_data[df_data['is_valid'] == False][:256] + else: + self.df_data = df_data[df_data['is_valid'] == True] + + def __len__(self): + return len(self.df_data) + + def __getitem__(self, idx): + path = self.df_data.iloc[idx]['path'] + path = os.path.join(self.root, path) + image = self.get_image(path) + label = path.split('/')[-2] + label = self.label_to_num[label] + return image, label + + def get_image(self, path): + image = cv2.imread(path) + if self.transform: + image = self.transform.call(image) + image = np.transpose(image, (2, 0, 1)) + return torch.tensor(image, dtype=torch.float) + + +def train(dataloader, model, num_epochs, criterion, optimizer): + start = time.time() + for epoch in range(num_epochs): + model.train() + + for inputs, targets in tqdm.tqdm(dataloader, total=len(dataloader)): + optimizer.zero_grad() + preds = model(inputs) + loss = criterion(preds, targets) + loss.backward() + optimizer.step() + + end = time.time() + print(end-start) + + +def main(): + args = get_args() + root_dir = args.root + lr = args.lr + + df_train = pd.read_csv(os.path.join(root_dir, "noisy_imagenette.csv")) + print('load %d records' % len(df_train)) + + transforms = cv2.Compose([ + cv2.RandomCrop((300, 300), (0,0,0,0)), + cv2.RandomFlip(), + cv2.Resize((500, 500)), + cv2.Normalize(mean=(0.406, 0.456, 0.485), std=(0.225, 0.224, 0.229)) + ]) + + train_set = ImagenetteDataset(root_dir, df_train, 'train', transforms) + + train_loader = data.DataLoader(train_set, num_workers=0, batch_size=16, drop_last=True, shuffle=True) + model = resnet18(pretrained=True) + model.fc = torch.nn.Linear(in_features=512, out_features=10) + optimizer = torch.optim.Adam(model.parameters(), lr=lr) + criterion = torch.nn.CrossEntropyLoss() + + train(train_loader, model, 1, criterion, optimizer) + + +if __name__ == '__main__': + main() diff --git a/modules/imgaug/samples/train_det_net.py b/modules/imgaug/samples/train_det_net.py new file mode 100644 index 00000000000..2230445e582 --- /dev/null +++ b/modules/imgaug/samples/train_det_net.py @@ -0,0 +1,151 @@ +import os +import time + +import numpy as np +import torch +import cv2 +import argparse +import torchvision +from tqdm import tqdm + + +def get_args(): + parser = argparse.ArgumentParser() + parser.add_argument("--root", type=str, default="PennFudanPed") + parser.add_argument("--lr", type=float, default=3e-4) + + return parser.parse_args() + + +class PennFudanDataset(torch.utils.data.Dataset): + def __init__(self, root, transforms=None): + self.root = root + self.transforms = transforms + # load all image files, sorting them to + # ensure that they are aligned + self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages")))) + self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks")))) + + def _get_boxes(self, mask): + obj_ids = np.unique(mask) + # first id is the background, so remove it + obj_ids = obj_ids[1:] + + # split the color-encoded mask into a set + # of binary masks + masks = mask == obj_ids[:, None, None] + + # get bounding box coordinates for each mask + num_objs = len(obj_ids) + for i in range(num_objs): + pos = np.where(masks[i]) + xmin = np.min(pos[1]) + xmax = np.max(pos[1]) + ymin = np.min(pos[0]) + ymax = np.max(pos[0]) + yield xmin, ymin, xmax, ymax + + def __getitem__(self, idx): + # load images and masks + img_path = os.path.join(self.root, "PNGImages", self.imgs[idx]) + mask_path = os.path.join(self.root, "PedMasks", self.masks[idx]) + img = cv2.imread(img_path) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + # mask is array of size (H, W), all elements of array are integers + # background is 0, and each distinct person is represented as a distinct integer starting from 1 + # you can treat mask as grayscale image + mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) + boxes = [] + for x1, y1, x2, y2 in self._get_boxes(mask): + # NOTE: in opencv, box is represented as (x, y, width, height) + boxes.append([x1, y1, x2-x1, y2-y1]) + num_objs = len(boxes) + labels = torch.ones((num_objs,), dtype=torch.int64) + + if self.transforms is not None: + img, boxes = self.transforms.call(img, boxes) + + # 1. transpose from (h, w, c) to (c, h, w) + # 2. normalize data into range 0-1 + # 3. convert from np.array to torch.tensor + img = torch.tensor(np.transpose(img, (2, 0, 1)), dtype=torch.float32) + boxes = [[x1, y1, x1+width, y1+height] for x1, y1, width, height in boxes] + boxes = torch.as_tensor(boxes, dtype=torch.float32) + + return img, boxes, labels + + def __len__(self): + return len(self.imgs) + + @staticmethod + def collate_fn(batch): + images = list() + boxes = list() + labels = list() + targets = list() + + for item in batch: + images.append(item[0]) + # boxes.append(item[1]) + # labels.append(item[2]) + target = {"boxes": item[1], "labels": item[2]} + targets.append(target) + + images = torch.stack(images, dim=0) + + return images, targets + + +def get_transforms(): + + transforms = cv2.det.Compose([ + cv2.det.RandomFlip(), + cv2.det.Resize((500, 500)), + ]) + + return transforms + + +def train(num_epochs, device, model, dataloader, optimizer): + for epoch in range(num_epochs): + model.train() + for batch in tqdm(dataloader, total=len(dataloader)): + optimizer.zero_grad() + + images, targets = batch + images = images.to(device) + + outputs = model(images, targets) + losses = sum(outputs.values()) + + losses.backward() + optimizer.step() + + +def main(): + args = get_args() + + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + + transforms = get_transforms() + dataset = PennFudanDataset(args.root, transforms=transforms) + + indices = torch.randperm(len(dataset)).tolist() + train_set = torch.utils.data.Subset(dataset, indices[:-50]) + test_set = torch.utils.data.Subset(dataset, indices[-50:]) + + train_loader = torch.utils.data.DataLoader(train_set, batch_size=4, shuffle=True, num_workers=0, collate_fn=PennFudanDataset.collate_fn) + test_loader = torch.utils.data.DataLoader(test_set, batch_size=4, shuffle=False, num_workers=0, collate_fn=PennFudanDataset.collate_fn) + + model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT").to(device) + + parameters = model.parameters() + optimizer = torch.optim.AdamW(parameters, lr=args.lr) + start = time.time() + train(2, device, model, train_loader, optimizer) + end = time.time() + print(end-start) + + +if __name__ == '__main__': + main() diff --git a/modules/imgaug/src/functional.cpp b/modules/imgaug/src/functional.cpp new file mode 100644 index 00000000000..73c0851b022 --- /dev/null +++ b/modules/imgaug/src/functional.cpp @@ -0,0 +1,76 @@ +#include "precomp.hpp" + +namespace cv{ + + void adjustBrightness(Mat& img, double brightness_factor){ + CV_Assert(brightness_factor >= 0); + + int channels = img.channels(); + if(channels != 1 && channels != 3){ + CV_Error(Error::BadNumChannels, "Only support images with 1 or 3 channels"); + } + img = img * brightness_factor; + } + + void adjustContrast(Mat& img, double contrast_factor){ + CV_Assert(contrast_factor >= 0); + + int num_channels = img.channels(); + if(num_channels != 1 && num_channels != 3){ + CV_Error(Error::BadNumChannels, "Only support images with 1 or 3 channels"); + } + Mat* channels = new Mat[num_channels]; + split(img, channels); + std::vector new_channels; + for(int i=0; i < num_channels; i++){ + Mat& channel = channels[i]; + Scalar avg = mean(channel); + Mat avg_mat(channel.size(), channel.type(), avg); + Mat new_channel = contrast_factor * channel + (1-contrast_factor) * avg_mat; + new_channels.push_back(new_channel); + } + merge(new_channels, img); + delete[] channels; + } + + void adjustSaturation(Mat& img, double saturation_factor){ + CV_Assert(saturation_factor >= 0); + + int num_channels = img.channels(); + if(num_channels != 1 && num_channels != 3){ + CV_Error(Error::BadNumChannels, "Only support images with 1 or 3 channels"); + } + if(img.channels() == 1) return; + Mat gray; + cvtColor(img, gray, COLOR_BGR2GRAY); + std::vector gray_arrays = {gray, gray, gray}; + merge(gray_arrays, gray); + img = saturation_factor * img + (1-saturation_factor) * gray; + } + + void adjustHue(Mat& img, double hue_factor) { + // FIXME: the range of hue_factor needs to be modified + CV_Assert(hue_factor >= -1 && hue_factor <= 1); + + int num_channels = img.channels(); + if (num_channels != 1 && num_channels != 3) { + CV_Error(Error::BadNumChannels, "Only support images with 1 or 3 channels"); + } + + if (num_channels == 1) return; + int hue_shift = saturate_cast (hue_factor * 180); + Mat hsv; + cvtColor(img, hsv, COLOR_BGR2HSV); + for (int j=0; j(j, i)[0]; + if(h + hue_shift > 180) + h = h + hue_shift - 180; + else + h = h + hue_shift; + hsv.at(j, i)[0] = h; + } + } + cvtColor(hsv, img, COLOR_HSV2BGR); + } +} diff --git a/modules/imgaug/src/precomp.hpp b/modules/imgaug/src/precomp.hpp new file mode 100644 index 00000000000..d8bc71cdcd9 --- /dev/null +++ b/modules/imgaug/src/precomp.hpp @@ -0,0 +1,13 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#ifndef OPENCV_AUG_PRECOMP_H +#define OPENCV_AUG_PRECOMP_H + +#include "opencv2/imgaug.hpp" +#include +#include +#include +#include + +#endif diff --git a/modules/imgaug/src/rng.cpp b/modules/imgaug/src/rng.cpp new file mode 100644 index 00000000000..262ef28275d --- /dev/null +++ b/modules/imgaug/src/rng.cpp @@ -0,0 +1,15 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#include "precomp.hpp" + +namespace cv{ + namespace imgaug{ + uint64 state = getTickCount(); + RNG rng(state); + + void setSeed(uint64 seed){ + rng.state = seed; + } + } +} \ No newline at end of file diff --git a/modules/imgaug/src/transforms.cpp b/modules/imgaug/src/transforms.cpp new file mode 100644 index 00000000000..89225cfe4e3 --- /dev/null +++ b/modules/imgaug/src/transforms.cpp @@ -0,0 +1,531 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#include "precomp.hpp" +#include +#include + +namespace cv{ + namespace imgaug{ + extern RNG rng; + + static void getRandomCropParams(int h, int w, int th, int tw, int* x, int* y); + static void getRandomResizedCropParams(int height, int width, const Vec2d& scale, const Vec2d& ratio, Rect& rect); + static void getRandomErasingCropParams(int height, int width, const Vec2d& scale, const Vec2d& ratio, Rect& rect); + static void getRandomAffineParams(const Size& size, const Vec2f& degrees, const Vec2f& translations, const Vec2f& scales, const Vec4f& shears, float* angle, float* translation_x, float* translation_y, float* scale, float* shear_x, float* shear_y); + static void getAffineMatrix(Mat mat, float angle, float tx, float ty, float scale, float shear_x, float shear_y, int cx, int cy); + + void randomCrop(InputArray _src, OutputArray _dst, const Size& sz, const Vec4i& padding, bool pad_if_need, int fill, int padding_mode){ + Mat src = _src.getMat(); + + if(padding != Vec4i()){ + copyMakeBorder(src, src, padding[0], padding[1], padding[2], padding[3], padding_mode, fill); + } + + // pad the height if needed + if(pad_if_need && src.rows < sz.height){ + Vec4i _padding = {sz.height - src.rows, sz.height - src.rows, 0, 0}; + copyMakeBorder(src, src, _padding[0], _padding[1], _padding[2], _padding[3], padding_mode, fill); + } + // pad the width if needed + if(pad_if_need && src.cols < sz.width){ + Vec4i _padding = {0, 0, sz.width - src.cols, sz.width - src.cols}; + copyMakeBorder(src, src, _padding[0], _padding[1], _padding[2], _padding[3], padding_mode, fill); + } + + int x, y; + getRandomCropParams(src.rows, src.cols, sz.height, sz.width, &x, &y); + + Mat RoI(src, Rect(x, y, sz.width, sz.height)); + RoI.copyTo(_dst); + + // NOTE: inplace operation not works in converting from python to numpy + // _dst.move(RoI); + } + + + static void getRandomCropParams(int h, int w, int th, int tw, int* x, int* y){ + if(h+1 < th || w+1 < tw){ + CV_Error( Error::StsBadSize, "The cropped size is larger than the image size" ); + } + if(h == th && w == tw){ + (*x) = 0; + (*y) = 0; + return; + } + + (*x) = rng.uniform(0, w-tw+1); + (*y) = rng.uniform(0, h-th+1); + + } + + RandomCrop::RandomCrop(const Size& _sz, const Vec4i& _padding, bool _pad_if_need, int _fill, int _padding_mode): + sz (_sz), + padding (_padding), + pad_if_need (_pad_if_need), + fill (_fill), + padding_mode (_padding_mode){}; + + void RandomCrop::call(InputArray src, OutputArray dst) const{ + randomCrop(src, dst, sz, padding, pad_if_need, fill, padding_mode); + } + + void randomFlip(InputArray _src, OutputArray _dst, int flipCode, double p){ + + bool flag = rng.uniform(0., 1.) < p; + + Mat src = _src.getMat(); + + if(!flag){ + _dst.move(src); + return; + } + flip(src, src, flipCode); + _dst.move(src); + } + + RandomFlip::RandomFlip(int _flipCode, double _p): + flipCode(_flipCode), + p(_p){}; + + void RandomFlip::call(InputArray src, OutputArray dst) const{ + randomFlip(src, dst); + } + + Compose::Compose(std::vector >& _transforms): + transforms(_transforms){}; + + void Compose::call(InputArray _src, OutputArray _dst) const{ + Mat src = _src.getMat(); + + for(auto it = transforms.begin(); it != transforms.end(); ++it){ + (*it)->call(src, src); + } + src.copyTo(_dst); + } + + Resize::Resize(const Size& _sz, int _interpolation): + sz(_sz), + interpolation(_interpolation){}; + + void Resize::call(InputArray src, OutputArray dst) const{ + resize(src, dst, sz, 0, 0, interpolation); + } + + void centerCrop(InputArray _src, OutputArray _dst, const Size& size) { + Mat src = _src.getMat(); + Mat padded(src); + // pad the input image if needed + if (size.width > src.cols || size.height > src.rows) { + int top = size.height - src.rows > 0 ? static_cast((size.height - src.rows) / 2) : 0; + int bottom = size.height - src.rows > 0 ? static_cast((size.height - src.rows) / 2) : 0; + int left = size.width - src.cols > 0 ? static_cast((size.width - src.cols) / 2) : 0; + int right = size.width - src.cols > 0 ? static_cast((size.width - src.cols) / 2) : 0; + + // fill with value 0 + copyMakeBorder(src, padded, top, bottom, left, right, BORDER_CONSTANT, 0); + } + + int x = static_cast((padded.cols - size.width) / 2); + int y = static_cast((padded.rows - size.height) / 2); + + Mat cropped(padded, Rect(x, y, size.width, size.height)); + _dst.move(cropped); + } + + CenterCrop::CenterCrop(const Size& _size) : + size(_size) {}; + + void CenterCrop::call(InputArray src, OutputArray dst) const { + centerCrop(src, dst, size); + } + + Pad::Pad(const Vec4i& _padding, const Scalar& _fill, int _padding_mode) : + padding(_padding), + fill(_fill), + padding_mode(_padding_mode) {}; + + void Pad::call(InputArray src, OutputArray dst) const { + copyMakeBorder(src, dst, padding[0], padding[1], padding[2], padding[3], padding_mode, fill); + } + + void randomResizedCrop(InputArray _src, OutputArray _dst, const Size& size, const Vec2d& scale, const Vec2d& ratio, int interpolation) { + // Ensure scale range and ratio range are valid + CV_Assert(scale[0] <= scale[1] && ratio[0] <= ratio[1]); + + Mat src = _src.getMat(); + + Rect crop_rect; + getRandomResizedCropParams(src.rows, src.cols, scale, ratio, crop_rect); + Mat cropped(src, Rect(crop_rect)); + resize(cropped, _dst, size, 0.0, 0.0, interpolation); + } + + static void getRandomResizedCropParams(int height, int width, const Vec2d& scale, const Vec2d& ratio, Rect& rect) { + // This implementation is inspired from the implementation in torchvision + // https://github.com/pytorch/vision/blob/main/torchvision/transforms/transforms.py + + int area = height * width; + + for (int i = 0; i < 10; i++) { + double target_area = rng.uniform(scale[0], scale[1]) * area; + double aspect_ratio = rng.uniform(ratio[0], ratio[1]); + + int w = static_cast(round(sqrt(target_area * aspect_ratio))); + int h = static_cast(round(sqrt(target_area / aspect_ratio))); + + if (w > 0 && w <= width && h > 0 && h <= height) { + rect.x = rng.uniform(0, width - w + 1); + rect.y = rng.uniform(0, height - h + 1); + rect.width = w; + rect.height = h; + return; + } + } + + // Center Crop + double in_ratio = static_cast(width) / height; + if (in_ratio < ratio[0]) { + rect.width = width; + rect.height = static_cast (round(width / ratio[0])); + } + else if (in_ratio > ratio[1]) { + rect.height = height; + rect.width = static_cast (round(height * ratio[1])); + } + else { + rect.width = width; + rect.height = height; + } + rect.x = (width - rect.width) / 2; + rect.y = (height - rect.height) / 2; + + } + + RandomResizedCrop::RandomResizedCrop(const Size& _size, const Vec2d& _scale, const Vec2d& _ratio, int _interpolation) : + size(_size), + scale(_scale), + ratio(_ratio), + interpolation(_interpolation) {}; + + void RandomResizedCrop::call(InputArray src, OutputArray dst) const{ + randomResizedCrop(src, dst, size, scale, ratio, interpolation); + } + + void colorJitter(InputArray _src, OutputArray _dst, const Vec2d& brightness, const Vec2d& contrast, const Vec2d& saturation, const Vec2d& hue){ + // TODO: check input values + Mat src = _src.getMat(); + + double brightness_factor = 1, contrast_factor = 1, saturation_factor = 1, hue_factor = 0; + + if(brightness != Vec2d()) + brightness_factor = rng.uniform(brightness[0], brightness[1]); + if(contrast != Vec2d()) + contrast_factor = rng.uniform(contrast[0], contrast[1]); + if(saturation != Vec2d()) + saturation_factor = rng.uniform(saturation[0], saturation[1]); + if(hue != Vec2d()) + hue_factor = rng.uniform(hue[0], hue[1]); + + int order[4] = {1,2,3,4}; + std::random_shuffle(order, order+4); + + for(int i : order){ + if(i == 1 && brightness_factor != 1) + cv::adjustBrightness(src, brightness_factor); + if(i == 2 && contrast_factor != 1) + cv::adjustContrast(src, contrast_factor); + if(i == 3 && saturation_factor != 1) + cv::adjustSaturation(src, saturation_factor); + if(i == 4 && hue_factor != 0) + cv::adjustHue(src, hue_factor); + } + + _dst.move(src); + } + + ColorJitter::ColorJitter(const Vec2d& _brightness, const Vec2d& _contrast, const Vec2d& _saturation, + const Vec2d& _hue): + brightness(_brightness), + contrast(_contrast), + saturation(_saturation), + hue(_hue){}; + + void ColorJitter::call(InputArray src, OutputArray dst) const{ + colorJitter(src, dst, brightness, contrast, saturation, hue); + } + + void randomRotation(InputArray _src, OutputArray _dst, const Vec2d& degrees, int interpolation, const Point2f& center, const Scalar& fill){ + Mat src = _src.getMat(); + // TODO: check the validation of degrees + double angle = rng.uniform(degrees[0], degrees[1]); + + Point2f pt(src.cols/2., src.rows/2.); + if(center != Point2f()) pt = center; + + Mat r = getRotationMatrix2D(pt, angle, 1.0); + + // TODO: auto expand dst size to fit the rotated image + warpAffine(src, _dst, r, src.size(), interpolation, BORDER_CONSTANT, fill); + } + + RandomRotation::RandomRotation(const Vec2d& _degrees, int _interpolation, const Point2f& _center, const Scalar& _fill): + degrees(_degrees), + interpolation(_interpolation), + center(_center), + fill(_fill){}; + + void RandomRotation::call(InputArray src, OutputArray dst) const{ + randomRotation(src, dst, degrees, interpolation, center, fill); + } + + void grayScale(InputArray _src, OutputArray _dst, int num_channels){ + Mat src = _src.getMat(); + cvtColor(src, src, COLOR_BGR2GRAY); + + if(num_channels == 1){ + _dst.move(src); + return; + } + Mat channels[3] = {src, src, src}; + merge(channels, 3, _dst); + } + + GrayScale::GrayScale(int _num_channels): + num_channels(_num_channels){}; + + void GrayScale::call(InputArray _src, OutputArray _dst) const{ + grayScale(_src, _dst, num_channels); + } + + void randomGrayScale(InputArray _src, OutputArray _dst, double p){ + if(rng.uniform(0.0, 1.0) < p){ + grayScale(_src, _dst, _src.channels()); + return; + } + Mat src = _src.getMat(); + _dst.move(src); + } + + RandomGrayScale::RandomGrayScale(double _p): + p(_p){}; + + void RandomGrayScale::call(InputArray src, OutputArray dst) const{ + randomGrayScale(src, dst); + } + + void randomErasing(InputArray _src, OutputArray _dst, double p, const Vec2d& scale, const Vec2d& ratio, const Scalar& value, bool inplace){ + // TODO: check the range of input values + Mat src = _src.getMat(); + if(rng.uniform(0., 1.) >= p){ + _dst.move(src); + return; + } + + Rect roi; + getRandomErasingCropParams(src.rows, src.cols, scale, ratio, roi); + + Mat erased(src, roi); + + int rows = erased.rows; + int cols = erased.cols; + int cn = erased.channels(); + for(int j=0; j(j); + for(int i=0; i(round(sqrt(target_area * aspect_ratio))); + int h = static_cast(round(sqrt(target_area / aspect_ratio))); + + if (w > 0 && w <= width && h > 0 && h <= height) { + rect.x = rng.uniform(0, width - w + 1); + rect.y = rng.uniform(0, height - h + 1); + rect.width = w; + rect.height = h; + return; + } + } + + // Center Crop + double in_ratio = static_cast(width) / height; + if (in_ratio < ratio[0]) { + rect.width = width; + rect.height = static_cast (round(width / ratio[0])); + } + else if (in_ratio > ratio[1]) { + rect.height = height; + rect.width = static_cast (round(height * ratio[1])); + } + else { + rect.width = width; + rect.height = height; + } + rect.x = (width - rect.width) / 2; + rect.y = (height - rect.height) / 2; + } + + RandomErasing::RandomErasing(double _p, const Vec2d& _scale, const Vec2d& _ratio, const Scalar& _value, bool _inplace): + p(_p), + scale(_scale), + ratio(_ratio), + value(_value), + inplace(_inplace){}; + + void RandomErasing::call(InputArray src, OutputArray dst) const{ + randomErasing(src, dst, p, scale, ratio, value, inplace); + } + + // NOTE: because Scalar contains 4 elements at most, normalize can only apply to image with channels no more than 4. + Normalize::Normalize(const Scalar& _mean, const Scalar& _std): + mean(_mean), + std(_std){}; + + void Normalize::call(InputArray _src, OutputArray _dst) const{ + Mat src = _src.getMat(); + + _dst.create(src.size(), CV_32FC3); + Mat dst = _dst.getMat(); + + int cn = src.channels(); + std::vector channels; + split(src, channels); + + // normalize each channel to 0-1 first + for(int i=0; i(src.cols / 2); + center.y = static_cast(src.rows / 2); + }else{ + center = _center; + } + + float angle, translation_x, translation_y, scale, shear_x, shear_y; + getRandomAffineParams(src.size(), degrees, translations, scales, shears, &angle, &translation_x, &translation_y, &scale, &shear_x, &shear_y); + + Mat affine_matrix = Mat::eye(2, 3, CV_32F); + + // TODO: check whether equations are right + getAffineMatrix(affine_matrix, angle, translation_x, translation_y, scale, shear_x, shear_y, center.x, center.y); + warpAffine(src, src, affine_matrix, src.size(), interpolation, BORDER_CONSTANT, fill); + _dst.move(src); + } + + static void getAffineMatrix(Mat mat, float angle, float tx, float ty, float scale, float shear_x, float shear_y, int cx, int cy){ + float* data = mat.ptr(0); + + // convert from degrees to radians + angle = (float)(CV_PI * angle) / 180; + shear_x = (float)(CV_PI * shear_x) / 180; + shear_y = (float)(CV_PI * shear_y) / 180; + + data[0] = scale * cos(angle - shear_y) / cos(shear_y); + data[1] = scale * (-cos(angle - shear_y) * tan(shear_x) / cos(shear_y) - sin(angle)); + data[3] = scale * sin(angle - shear_y) / cos(shear_y); + data[4] = scale * (-sin(angle - shear_y) * tan(shear_x) / cos(shear_y) + cos(angle)); + data[2] = cx * (1-data[0]) + data[1] * (-cy) + tx; + data[5] = cy * (1-data[4]) + data[3] * (-cx) + ty; + } + + static void getRandomAffineParams(const Size& size, const Vec2f& degrees, const Vec2f& translations, const Vec2f& scales, const Vec4f& shears, float* angle, float* translation_x, float* translation_y, float* scale, float* shear_x, float* shear_y){ + + if(degrees == Vec2f(0, 0)) { + *angle = 0; + } + else{ + *angle = rng.uniform(degrees[0], degrees[1]); + } + + if(translations == Vec2f(0, 0)) { + *translation_x = 0; + *translation_y = 0; + } + else{ + *translation_x = rng.uniform(-translations[0], translations[0]) * size.width; + *translation_y = rng.uniform(-translations[1], translations[1]) * size.height; + } + + if(scales == Vec2f(1, 1)) { + *scale = 1; + } + else{ + *scale = rng.uniform(scales[0], scales[1]); + } + + if(shears == Vec4f(0, 0, 0, 0)) { + *shear_x = 0; + *shear_y = 0; + } + else{ + *shear_x = rng.uniform(shears[0], shears[1]); + *shear_y = rng.uniform(shears[2], shears[3]); + } + + } + + RandomAffine::RandomAffine(const Vec2f& _degrees, const Vec2f& _translations, const Vec2f& _scales, const Vec4f& _shears, int _interpolation, const Scalar& _fill, const Point2i& _center): + degrees(_degrees), + translations(_translations), + scales(_scales), + shears(_shears), + interpolation(_interpolation), + fill(_fill), + center(_center){}; + + void RandomAffine::call(InputArray src, OutputArray dst) const{ + randomAffine(src, dst, degrees, translations, scales, shears, interpolation, fill, center); + } + } +} diff --git a/modules/imgaug/src/transforms_det.cpp b/modules/imgaug/src/transforms_det.cpp new file mode 100644 index 00000000000..b49e721d7be --- /dev/null +++ b/modules/imgaug/src/transforms_det.cpp @@ -0,0 +1,220 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#include "precomp.hpp" +#include +#include + +namespace cv{ + namespace imgaug{ + extern RNG rng; + + namespace det{ + int clamp(int v, int lo, int hi); + void rotate(int* x, int* y, int cx, int cy, double angle); + + Compose::Compose(std::vector >& _transforms): + transforms(_transforms){}; + + void Compose::call(InputArray _src, OutputArray _dst, std::vector& target, std::vector& labels) const{ + Mat src = _src.getMat(); + for(cv::imgaug::det::Transform* transform:transforms){ + transform->call(src, src, target, labels); + } + src.copyTo(_dst); + } + + RandomFlip::RandomFlip(int _flipCode, float _p): + flipCode(_flipCode), p(_p) + { + if(p < 0 || p > 1){ + CV_Error(Error::Code::StsBadArg, "probability p must be between range 0 and 1"); + } + }; + + void RandomFlip::call(InputArray _src, OutputArray _dst, std::vector& target, std::vector& labels) const{ + CV_Assert(target.size() == labels.size()); + bool flag = rng.uniform(0., 1.) < p; + + Mat src = _src.getMat(); + if(!flag){ + _dst.move(src); + return; + } + + flipBoundingBox(target, src.size()); + flip(src, src, flipCode); + _dst.move(src); + } + + void RandomFlip::flipBoundingBox(std::vector& target, const Size& size) const{ + /* + * flipCode = 0 (flip vertically): (x', y') = (x, img.height - y - bbox.height) + * flipCode > 0 (flip horizontally): (x', y') = (img.width - x - bbox.width, y) + * flipCode < 0 (flip diagonally): (x', y') = (img.width - x - bbox.width, img.height - y - bbox.height) + */ + for(unsigned i = 0; i < target.size(); i++){ + if(flipCode == 0){ + target[i].y = size.height - target[i].y - target[i].height; + }else if(flipCode > 0){ + target[i].x = size.width - target[i].x - target[i].width; + }else{ + target[i].x = size.width - target[i].x - target[i].width; + target[i].y = size.height - target[i].y - target[i].height; + } + } + } + + Resize::Resize(const Size& _size, int _interpolation): + size(_size), interpolation(_interpolation){}; + + void Resize::call(InputArray _src, OutputArray dst, std::vector& target, std::vector& labels) const{ + CV_Assert(target.size() == labels.size()); + Mat src = _src.getMat(); + resize(src, dst, size, 0, 0, interpolation); + resizeBoundingBox(target, src.size()); + } + + void Resize::resizeBoundingBox(std::vector& target, const Size& imgSize) const{ + for(unsigned i=0; i(size.width) / imgSize.width * target[i].x; + target[i].y = static_cast(size.height) / imgSize.height * target[i].y; + target[i].width = static_cast(size.width) / imgSize.width * target[i].width; + target[i].height = static_cast(size.height) / imgSize.height * target[i].height; + } + } + + Convert::Convert(int _code): + code(_code){}; + + void Convert::call(InputArray src, OutputArray dst, std::vector& target, std::vector& labels) const{ + CV_Assert(target.size() == labels.size()); + cvtColor(src, dst, code); + } + + RandomTranslation::RandomTranslation(const cv::Vec2i& _translations, float _threshold): + translations(_translations), + threshold(_threshold){}; + + + void RandomTranslation::call(cv::InputArray _src, cv::OutputArray _dst, std::vector &bboxes, std::vector& labels) const { + CV_Assert(bboxes.size() == labels.size()); + int tx = rng.uniform(-translations[0], translations[0]); + int ty = rng.uniform(-translations[1], translations[1]); + + Mat translation_matrix = Mat::eye(2, 3, CV_32F); + float* data = translation_matrix.ptr(); + data[0] = 1; + data[1] = 0; + data[2] = tx; + data[3] = 0; + data[4] = 1; + data[5] = ty; + + cv::warpAffine(_src, _dst, translation_matrix, _src.size()); + translateBoundingBox(bboxes, labels, _src.size(), tx, ty); + } + + + void RandomTranslation::translateBoundingBox(std::vector &bboxes, std::vector &labels, const cv::Size &imgSize, int tx, int ty) const { + for(unsigned i=0; i < bboxes.size(); i++){ + int x1 = clamp(bboxes[i].x + tx, 0, imgSize.width); + int y1 = clamp(bboxes[i].y + ty, 0, imgSize.height); + int x2 = clamp(bboxes[i].x + bboxes[i].width + tx, 0, imgSize.width); + int y2 = clamp(bboxes[i].y + bboxes[i].height + ty, 0, imgSize.height); + int w = x2 - x1; + int h = y2 - y1; + if((float)(w * h) / (bboxes[i].width * bboxes[i].height) < threshold){ + bboxes.erase(bboxes.begin() + i); + labels.erase(labels.begin() + i); + }else{ + bboxes[i].x = x1; + bboxes[i].y = y1; + bboxes[i].width = x2 - x1; + bboxes[i].height = y2 - y1; + } + } + } + + RandomRotation::RandomRotation(const cv::Vec2d &_angles, double _threshold): + angles(_angles), + threshold(_threshold){}; + + void RandomRotation::call(cv::InputArray _src, cv::OutputArray _dst, std::vector &bboxes, + std::vector &labels) const { + CV_Assert(bboxes.size() == labels.size()); + Mat src = _src.getMat(); + double angle = rng.uniform(angles[0], angles[1]); + Mat rotation_matrix = getRotationMatrix2D(cv::Point2f(src.cols/2., src.rows/2.), angle, 1); + warpAffine(src, _dst, rotation_matrix, src.size()); + + Mat dst = _dst.getMat(); + rotateBoundingBoxes(bboxes, labels, angle, src.cols / 2, src.rows / 2, dst.size()); + } + + void RandomRotation::rotateBoundingBoxes(std::vector &bboxes, std::vector &labels, + double angle, int cx, int cy, const Size& imgSize) const { + angle = -angle * CV_PI / 180; + + for(unsigned i=0; i < bboxes.size(); i++){ + int x1 = bboxes[i].x; + int y1 = bboxes[i].y; + int x2 = bboxes[i].x + bboxes[i].width; + int y2 = bboxes[i].y; + int x3 = bboxes[i].x; + int y3 = bboxes[i].y + bboxes[i].height; + int x4 = bboxes[i].x + bboxes[i].width; + int y4 = bboxes[i].y + bboxes[i].height; + + // convert unit from degree to radius + // rotate the corners + rotate(&x1, &y1, cx, cy, angle); + rotate(&x2, &y2, cx, cy, angle); + rotate(&x3, &y3, cx, cy, angle); + rotate(&x4, &y4, cx, cy, angle); + + // shrink the rotated corners to get an enclosing box + int x_min = min({x1, x2, x3, x4}); + int y_min = min({y1, y2, y3, y4}); + int x_max = max({x1, x2, x3, x4}); + int y_max = max({y1, y2, y3, y4}); + + x_min = clamp(x_min, 0, imgSize.width); + y_min = clamp(y_min, 0, imgSize.height); + x_max = clamp(x_max, 0, imgSize.width); + y_max = clamp(y_max, 0, imgSize.height); + + int w = x_max - x_min; + int h = y_max - y_min; + + if((float)(w * h) / (bboxes[i].width * bboxes[i].height) < threshold){ + bboxes.erase(bboxes.begin() + i); + labels.erase(labels.begin() + i); + }else{ + bboxes[i].x = x_min; + bboxes[i].y = y_min; + bboxes[i].width = w; + bboxes[i].height = h; + } + + } + } + + inline int clamp(int v, int lo, int hi){ + if(v < lo){ + return lo; + } + if(v > hi){ + return hi; + } + return v; + } + + inline void rotate(int* x, int* y, int cx, int cy, double angle){ + // NOTE: when the unit of angle is degree instead of radius, the result may be incorrect. + (*x) = (int)round(((*x) - cx) * cos(angle) - ((*y) - cy) * sin(angle) + cx); + (*y) = (int)round(((*x) - cx) * sin(angle) + ((*y) - cy) * cos(angle) + cy); + } + } + } +} \ No newline at end of file diff --git a/modules/imgaug/test/test_imgaug.cpp b/modules/imgaug/test/test_imgaug.cpp new file mode 100644 index 00000000000..9f41fa83b8b --- /dev/null +++ b/modules/imgaug/test/test_imgaug.cpp @@ -0,0 +1,331 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#include "test_precomp.hpp" + +namespace opencv_test{ namespace{ + + +TEST(Aug_RandomCrop, no_padding){ + cout << "run test: no_padding" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + + int th = 200; + int tw = 200; + + string ref_path = findDataFile("imgaug/random_crop_test_0.jpg"); + Mat ref = imread(ref_path); + + int seed = 0; + + cv::imgaug::setSeed(seed); + cv::imgaug::RandomCrop aug(Size(tw, th)); + Mat out; + aug.call(input, out); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + +TEST(Aug_RandomCrop, padding){ + cout << "run test: padding" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + + int seed = 0; + + int th = 200; + int tw = 200; + Vec4d padding {10, 20, 30, 40}; + + string ref_path = findDataFile("imgaug/random_crop_test_1.jpg"); + Mat ref = imread(ref_path); + + imgaug::setSeed(seed); + cv::imgaug::RandomCrop aug(Size(tw, th), padding); + Mat out; + aug.call(input, out); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + +TEST(Aug_RandomFlip, diagonal){ + cout << "run test: random flip (diagonal)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + string ref_path = findDataFile("imgaug/random_flip_test_2.jpg"); + Mat ref = imread(ref_path); + + cv::imgaug::RandomFlip aug(0, 1); + aug.call(input, out); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + +TEST(Aug_Resize, basic){ + cout << "run test: resize (basic)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + string ref_path = findDataFile("imgaug/resize_test_3.jpg"); + Mat ref = imread(ref_path); + + cv::imgaug::Resize aug(cv::Size(256, 128)); + aug.call(input, out); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +TEST(Aug_CenterCrop, basic){ + cout << "run test: center crop (basic)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + string ref_path = findDataFile("imgaug/center_crop_test_4.jpg"); + Mat ref = imread(ref_path); + + cv::imgaug::CenterCrop aug(cv::Size(400, 300)); + aug.call(input, out); + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +TEST(Aug_Pad, basic){ + cout << "run test: pad (basic)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + string ref_path = findDataFile("imgaug/pad_test_5.jpg"); + Mat ref = imread(ref_path); + + cv::imgaug::Pad aug(Vec4i(10, 20, 30, 40), Scalar(0)); + aug.call(input, out); + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + +TEST(Aug_RandomResizedCrop, basic){ + cout << "run test: random resized crop (basic)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + cv::Size size(1024, 512); + uint64 seed = 10; + cv::imgaug::setSeed(seed); + + string ref_path = findDataFile("imgaug/random_resized_crop_test_6.jpg"); + Mat ref = imread(ref_path); + + cv::imgaug::RandomResizedCrop aug(size); + + aug.call(input, out); + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +TEST(Aug_RandomRotation, not_expand){ + cout << "run test: random rotation (not_expand)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + cv::Vec2d degrees(-10, 10); + uint64 seed = 5; + cv::imgaug::setSeed(seed); + + string ref_path = findDataFile("imgaug/random_rotation_test_7.jpg"); + Mat ref = imread(ref_path); + + cv::imgaug::RandomRotation aug(degrees); + + aug.call(input, out); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + +TEST(Aug_GrayScale, basic){ + cout << "run test: gray scale (basic)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + string ref_path = findDataFile("imgaug/gray_scale_test_8.jpg"); + Mat ref = imread(ref_path, IMREAD_GRAYSCALE); + + cv::imgaug::GrayScale aug; + + aug.call(input, out); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +TEST(Aug_GaussianBlur, basic){ + cout << "run test: gaussian blur (basic)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + string ref_path = findDataFile("imgaug/gaussian_blur_test_9.jpg"); + Mat ref = imread(ref_path); + cv::imgaug::setSeed(15); + cv::imgaug::GaussianBlur aug(Size(5, 5)); + + aug.call(input, out); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +TEST(Aug_Normalize, basic){ + cout << "run test: gaussian blur (basic)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + string ref_path = findDataFile("imgaug/normalize_test_10.jpg"); + Mat ref = imread(ref_path); + cv::imgaug::setSeed(15); + // Mean and std for ImageNet is [0.485, 0.456, 0.406], [0.229, 0.224, 0.225] in order of RGB. + // For order of BGR, they should be (0.406, 0.456, 0.485), (0.225, 0.224, 0.229) + cv::imgaug::Normalize aug(Scalar(0.406, 0.456, 0.485), Scalar(0.225, 0.224, 0.229)); + aug.call(input, out); + out.convertTo(out, CV_8UC3, 255); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +TEST(Aug_ColorJitter, basic){ + cout << "run test: color jitter (basic)" << endl; + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat input = imread(img_path); + Mat out; + + string ref_path = findDataFile("imgaug/color_jitter_test_11.jpg"); + Mat ref = imread(ref_path); + cv::imgaug::setSeed(15); + // Mean and std for ImageNet is [0.485, 0.456, 0.406], [0.229, 0.224, 0.225] in order of RGB. + // For order of BGR, they should be (0.406, 0.456, 0.485), (0.225, 0.224, 0.229) + cv::imgaug::ColorJitter aug(cv::Vec2d(0, 2), cv::Vec2d(0, 2), cv::Vec2d(0, 2), cv::Vec2d(-0.5, 0.5)); + aug.call(input, out); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols ) { + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +}} diff --git a/modules/imgaug/test/test_imgaug_det.cpp b/modules/imgaug/test/test_imgaug_det.cpp new file mode 100644 index 00000000000..4f4edaa8179 --- /dev/null +++ b/modules/imgaug/test/test_imgaug_det.cpp @@ -0,0 +1,254 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#include "test_precomp.hpp" + +namespace opencv_test{ namespace{ + +void read_annotation(const String& path, std::vector& bboxes, std::vector& labels){ + FILE* fp; + fp = fopen(path.c_str(), "rt"); + + int n; + int sig; + sig = fscanf(fp, "%d", &n); + CV_Assert(sig != EOF); + + for(int i=0; i < n; i++){ + int x, y, w, h, l; + sig = fscanf(fp, "%d %d %d %d %d\n", &x, &y, &w, &h, &l); + CV_Assert(sig != EOF); + bboxes.push_back(Rect(x, y, w, h)); + labels.push_back(l); + } + + fclose(fp); +} + + +TEST(Aug_Det_RandomFlip, vertical){ + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat src = imread(img_path); + Mat out; + + int seed = 0; + cv::imgaug::setSeed(seed); + + + string ref_path = findDataFile("imgaug/det_random_flip_test_0.jpg"); + Mat ref = imread(ref_path); + + std::vector ref_bboxes; + std::vector ref_labels; + + String ref_data = findDataFile("imgaug/det_random_flip_test_0.dat"); + read_annotation(ref_data, ref_bboxes, ref_labels); + + + std::vector bboxes{ + Rect{112, 40, 249, 343}, + Rect{61, 273, 113, 228} + }; + + std::vector labels{1, 2}; + + int flipCode = 0; + cv::imgaug::det::RandomFlip aug(flipCode); + aug.call(src, out, bboxes, labels); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols && ref_bboxes.size() == bboxes.size() && ref_labels.size() == labels.size()) { + EXPECT_EQ(bboxes, ref_bboxes); + EXPECT_EQ(labels, ref_labels); + + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + +TEST(Aug_Det_Resize, small){ + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat src = imread(img_path); + Mat out; + + int seed = 0; + cv::imgaug::setSeed(seed); + + + string ref_path = findDataFile("imgaug/det_resize_test_0.jpg"); + Mat ref = imread(ref_path); + + std::vector ref_bboxes; + std::vector ref_labels; + + String ref_data = findDataFile("imgaug/det_resize_test_0.dat"); + read_annotation(ref_data, ref_bboxes, ref_labels); + + + std::vector bboxes{ + Rect{112, 40, 249, 343}, + Rect{61, 273, 113, 228} + }; + + std::vector labels{1, 2}; + + Size size(224, 224); + cv::imgaug::det::Resize aug(size); + aug.call(src, out, bboxes, labels); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols && ref_bboxes.size() == bboxes.size() && ref_labels.size() == labels.size()) { + EXPECT_EQ(bboxes, ref_bboxes); + EXPECT_EQ(labels, ref_labels); + + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +TEST(Aug_Det_Convert, BGR2GRAY){ + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat src = imread(img_path); + Mat out; + + int seed = 0; + cv::imgaug::setSeed(seed); + + + string ref_path = findDataFile("imgaug/det_convert_test_0.jpg"); + Mat ref = imread(ref_path, IMREAD_GRAYSCALE); + + std::vector ref_bboxes; + std::vector ref_labels; + + String ref_data = findDataFile("imgaug/det_convert_test_0.dat"); + read_annotation(ref_data, ref_bboxes, ref_labels); + + + std::vector bboxes{ + Rect{112, 40, 249, 343}, + Rect{61, 273, 113, 228} + }; + + std::vector labels{1, 2}; + + int code = COLOR_BGR2GRAY; + cv::imgaug::det::Convert aug(code); + aug.call(src, out, bboxes, labels); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols && ref_bboxes.size() == bboxes.size() && ref_labels.size() == labels.size()) { + EXPECT_EQ(bboxes, ref_bboxes); + EXPECT_EQ(labels, ref_labels); + + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +TEST(Aug_Det_RandomTranslation, no_drop){ + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat src = imread(img_path); + Mat out; + + int seed = 0; + cv::imgaug::setSeed(seed); + + string ref_path = findDataFile("imgaug/det_random_translation_test_0.jpg"); + Mat ref = imread(ref_path, IMREAD_COLOR); + + std::vector ref_bboxes; + std::vector ref_labels; + + String ref_data = findDataFile("imgaug/det_random_translation_test_0.dat"); + read_annotation(ref_data, ref_bboxes, ref_labels); + + std::vector bboxes{ + Rect{112, 40, 249, 343}, + Rect{61, 273, 113, 228} + }; + + std::vector labels{1, 2}; + + Vec2d trans(20, 20); + cv::imgaug::det::RandomTranslation aug(trans); + aug.call(src, out, bboxes, labels); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols && ref_bboxes.size() == bboxes.size() && ref_labels.size() == labels.size()) { + EXPECT_EQ(bboxes, ref_bboxes); + EXPECT_EQ(labels, ref_labels); + + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +TEST(Aug_Det_RandomRotation, no_drop){ + cvtest::TS* ts = cvtest::TS::ptr(); + string img_path = findDataFile("imgaug/lena.jpg"); + Mat src = imread(img_path); + Mat out; + + int seed = 0; + cv::imgaug::setSeed(seed); + + string ref_path = findDataFile("imgaug/det_random_rotation_test_0.jpg"); + Mat ref = imread(ref_path, IMREAD_COLOR); + + std::vector ref_bboxes; + std::vector ref_labels; + + String ref_data = findDataFile("imgaug/det_random_rotation_test_0.dat"); + read_annotation(ref_data, ref_bboxes, ref_labels); + + std::vector bboxes{ + Rect{112, 40, 249, 343}, + Rect{61, 273, 113, 228} + }; + + std::vector labels{1, 2}; + + Vec2d degrees(-30, 30); + cv::imgaug::det::RandomRotation aug(degrees); + aug.call(src, out, bboxes, labels); + + if ( out.rows > 0 && out.rows == ref.rows && out.cols > 0 && out.cols == ref.cols && ref_bboxes.size() == bboxes.size() && ref_labels.size() == labels.size()) { + EXPECT_EQ(bboxes, ref_bboxes); + EXPECT_EQ(labels, ref_labels); + + // Calculate the L2 relative error between images. + double errorL2 = cv::norm( out, ref, NORM_L2 ); + // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. + double error = errorL2 / (double)( out.rows * out.cols ); + EXPECT_LE(error, 0.1); + }else{ + ts->set_failed_test_info(TS::FAIL_MISMATCH); + } +} + + +}} diff --git a/modules/imgaug/test/test_main.cpp b/modules/imgaug/test/test_main.cpp new file mode 100644 index 00000000000..0e51ddfd050 --- /dev/null +++ b/modules/imgaug/test/test_main.cpp @@ -0,0 +1,6 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#include "test_precomp.hpp" + +CV_TEST_MAIN("cv") diff --git a/modules/imgaug/test/test_precomp.hpp b/modules/imgaug/test/test_precomp.hpp new file mode 100644 index 00000000000..d7ffec30338 --- /dev/null +++ b/modules/imgaug/test/test_precomp.hpp @@ -0,0 +1,13 @@ +// This file is part of OpenCV project. +// It is subject to the license terms in the LICENSE file found in the top-level directory +// of this distribution and at http://opencv.org/license.html. +#ifndef __OPENCV_TEST_PRECOMP_HPP__ +#define __OPENCV_TEST_PRECOMP_HPP__ + +#include "opencv2/ts.hpp" +#include "opencv2/imgaug.hpp" + +static uint64 seed=0; +static cv::RNG rng(seed); + +#endif \ No newline at end of file diff --git a/modules/imgaug/tutorials/imgaug_basic_usage/images/compose_out.jpg b/modules/imgaug/tutorials/imgaug_basic_usage/images/compose_out.jpg new file mode 100644 index 00000000000..cc590a08748 Binary files /dev/null and b/modules/imgaug/tutorials/imgaug_basic_usage/images/compose_out.jpg differ diff --git a/modules/imgaug/tutorials/imgaug_basic_usage/images/lena.jpg b/modules/imgaug/tutorials/imgaug_basic_usage/images/lena.jpg new file mode 100644 index 00000000000..add8374dfff Binary files /dev/null and b/modules/imgaug/tutorials/imgaug_basic_usage/images/lena.jpg differ diff --git a/modules/imgaug/tutorials/imgaug_basic_usage/images/random_crop_out.jpg b/modules/imgaug/tutorials/imgaug_basic_usage/images/random_crop_out.jpg new file mode 100644 index 00000000000..d657372b05c Binary files /dev/null and b/modules/imgaug/tutorials/imgaug_basic_usage/images/random_crop_out.jpg differ diff --git a/modules/imgaug/tutorials/imgaug_basic_usage/imgaug_basic_usage.markdown b/modules/imgaug/tutorials/imgaug_basic_usage/imgaug_basic_usage.markdown new file mode 100644 index 00000000000..a835678c38c --- /dev/null +++ b/modules/imgaug/tutorials/imgaug_basic_usage/imgaug_basic_usage.markdown @@ -0,0 +1,236 @@ +Data augmentation with imgaug {#tutorial_imgaug_basic_usage} +============================================================ + +@tableofcontents + +@next_tutorial{tutorial_imgaug_object_detection} + +| | | +| -: | :- | +| Author | Chuyang Zhao | +| Compatibility | OpenCV >= 4.0 | + + +Introduction +------ +From [Wikipedia](https://en.wikipedia.org/wiki/Data_augmentation), **data augmentation** are techniques used to increase the amount of data +by adding slightly modified copies of already existing data or newly created synthetic data from existing data. +It acts as a regularizer and helps reduce overfitting when training a machine learning model. + +In a narrow sense, data augmentation is to perform some sort of transforms on given images and generate the modified +images as additional training data, but broadly speaking, data augmentation can perform not only on images. +For computer vision tasks like object detection and semantic segmentation, the inputs contain not only images +but also annotation on the source images. So in these tasks, data augmentation should be able to perform transforms on +all these data. + +The imgaug module implemented in OpenCV takes both these requirements into account. You can use the imgaug module +for a wide range of computer vision tasks. +The imgaug module in OpenCV is implemented in pure C++ and is backend with OpenCV efficient image processing operations, +so it runs much faster and more efficiently than other existing Python-based implementation such as torchvision. Powered with OpenCV, the imgaug module +is cross-platform and can convert to other languages easily. This is especially useful when we want to +deploy our model along with its data preprocessing pipeline to the production environment for better inference speed. +With this feature, we can also use imgaug on other devices such as embedded systems and mobile phones easily. + +Goal +---- +In this tutorial, you will learn: +- How to use **imgaug** to perform data augmentation for images +- How to compose multiple methods into one data augmentation method +- How to change the seed of the random number generator used in **imgaug** + + +Usage +----- +In this section, I will use some methods in imgaug to demonstrate how to use imgaug to perform data augmentation on images. +For the details of all the methods in imgaug, please refer to the documentation @ref cv::imgaug . + +### Apply single data augmentation method +@add_toggle_cpp +In C++ environment, to use imgaug module you should include the header file: + +@code{.cpp} +#include +@endcode + +We call the constructor of the data augmentation class to get its initialized instance. +Here we get the instance of cv::imgaug::RandomCrop to perform random crop on the given images. cv::imgaug::RandomCrop requires parameter `sz` +which is the size of the cropped area on the given image, here we pass cv::Size(300, 300) for this parameter. + +@code{.cpp} +imgaug::RandomCrop randomCrop(cv::Size(300, 300)); +@endcode + +Then we read the source image in format cv::Mat and performs the data augmentation operation on it by calling cv::imgaug::RandomCrop::call function. + +@code{.cpp} +Mat src = imread(samples::findFile("lena.jpg"), IMREAD_COLOR); +Mat dst; +randomCrop.call(src, dst); +@endcode + +The original image is as follows: + +![](images/lena.jpg) + +You can display the augmented image after applying random crop by: + +@code{.cpp} +imshow("result", dst); +waitKey(0); +@endcode + +![](images/random_crop_out.jpg) + +@end_toggle + +@add_toggle_python +In Python, to use imgaug module you should import the following package: + +@code{.py} +from cv2 import imgaug +@endcode + +We call the constructor of the data augmentation class to get its initialized instance. +Here we get an instance of **cv::imgaug::RandomCrop** to perform random crop on the given images. **cv::imgaug::RandomCrop** requires a parameter `sz` +which is the size of the cropped area on the given image, here we pass a two-elements tuple `(300, 300)` for this parameter. + +@code{.py} +randomCrop = imgaug.RandomCrop(sz=(300, 300)) +@endcode + +Then we read the source image with **cv::imread** and performs the data augmentation operation on it by calling **cv::imgaug::RandomCrop::call** function. + +@code{.py} +src = cv2.imread("lena.jpg", cv2.IMREAD_COLOR) +dst = randomCrop.call(src) +@endcode + +The original image is as follows: + +![](images/lena.jpg) + +You can display the augmented image after applying random crop by: + +@code{.py} +cv2.imshow("result", dst) +cv2.waitKey(0) +@endcode + +![](images/random_crop_out.jpg) + +@end_toggle + +### Compose multiple data augmentation methods +@add_toggle_cpp +To compose multiple data augmentation methods into one, firstly you need to +initialize the data augmentation classes you want to use later: + +@code{.cpp} +imgaug::RandomCrop randomCrop(cv::Size(300, 300)); +imgaug::RandomFlip randomFlip(1); +imgaug::Resize resize(cv::Size(224, 224)); +@endcode + +Because in **cv::imgaug::Compose**, we call each data augmentation method by the pointer of their +base class **cv::imgaug::Transform**. We need to use a vector of type **cv::Ptr** to +store the addresses of all data augmentation instances. + +@code{.cpp} +std::vector > transforms {&randomCrop, &randomFlip, &resize}; +@endcode + +Then we construct the **cv::imgaug::Compose** class by passing `transforms` as the required argument. + +@code{.cpp} +imgaug::Compose aug(transforms); +@endcode + +We call the compose method the same way as normal data augmentation methods. The composed +method will call all the methods in `transforms` on the given image sequentially: + +@code{.cpp} +Mat src = imread(samples::findFile("lena.jpg"), IMREAD_COLOR); +Mat dst; +aug.call(src, dst); +@endcode + +Here is the result we get: + +![](images/compose_out.jpg) + +@end_toggle + +@add_toggle_python +To compose multiple data augmentation methods into one, firstly you need to +initialize the data augmentation classes you want to use later: + +@code{.py} +randomCrop = imgaug.RandomCrop((300, 300)) +randomFlip = imgaug.RandomFlip(1) +resize = imgaug.Resize((224, 224)) +@endcode + +We store all data augmentation instances in a list. + +@code{.py} +transforms = [randomCrop, randomFlip, resize] +@endcode + +Then we initialize the cv::imgaug::Compose class by passing the list of all data augmentation instances as the argument. + +@code{.py} +aug = imgaug.Compose(transforms) +@endcode + +We call the compose method the same way as normal data augmentation methods. +The composed method will apply all the data augmentation methods in transforms list to the given image sequentially. + +@code{.py} +src = cv2.imread("lena.jpg", cv2.IMREAD_COLOR) +dst = aug.call(src) +@endcode + +Here is the result we get: + +![](images/compose_out.jpg) + +@end_toggle + +### Change the seed of random number generator +@add_toggle_cpp +In imgaug, we use **cv::imgaug::rng** as our random number generator. The role of rng is to generate +random numbers for some random methods. For example, in cv::imgaug::RandomCrop we need to generate the coordinates +of the upper-left corner of the cropped rectangle randomly, in which we will use `rng` to generate random +numbers in valid range. When a random number is generated by `rng`, the internal state of `rng` will change. +Thus, we probably won't get the same result when we call the same method again. In the above process, the most +important thing is the initial state of `rng`, which determines the subsequent numbers `rng` generated. So in some +cases if you want to replicate other one's results, or if you want to make sure the random values generated will be +different the next time you run the same program. You can manually set the initial state of the `rng` by calling +**cv::imgaug::setSeed**. By default, if you don't manually set the initial state of `rng`, its initial state will be +set to the tick count since it was first initialized. + +@code{.cpp} +int seed = 1234; +imgaug::setSeed(seed); +@endcode + +@end_toggle + +@add_toggle_python +In imgaug, we use **cv::imgaug::rng** as our random number generator. The role of rng is to generate +random numbers for some random methods. For example, in cv::imgaug::RandomCrop we need to generate the coordinates +of the upper-left corner of the cropped rectangle randomly, in which we will use `rng` to generate random +numbers in valid range. When a random number is generated by `rng`, the internal state of `rng` will change. +Thus, we probably won't get the same result when we call the same method again. In the above process, the most +important thing is the initial state of `rng`, which determines the subsequent numbers `rng` generated. So in some +cases if you want to replicate other one's results, or if you want to make sure the random values generated will be +different the next time you run the same program. You can manually set the initial state of the `rng` by calling +**cv::imgaug::setSeed**. By default, if you don't manually set the initial state of `rng`, its initial state will be +set to the tick count since it was first initialized. + +@code{.py} +seed = 1234 +imgaug.setSeed(seed) +@endcode + +@end_toggle \ No newline at end of file diff --git a/modules/imgaug/tutorials/imgaug_obj_det/images/det_compose_out.jpg b/modules/imgaug/tutorials/imgaug_obj_det/images/det_compose_out.jpg new file mode 100644 index 00000000000..b874c481589 Binary files /dev/null and b/modules/imgaug/tutorials/imgaug_obj_det/images/det_compose_out.jpg differ diff --git a/modules/imgaug/tutorials/imgaug_obj_det/images/det_rotation_out.jpg b/modules/imgaug/tutorials/imgaug_obj_det/images/det_rotation_out.jpg new file mode 100644 index 00000000000..1017ee95331 Binary files /dev/null and b/modules/imgaug/tutorials/imgaug_obj_det/images/det_rotation_out.jpg differ diff --git a/modules/imgaug/tutorials/imgaug_obj_det/images/det_src.jpg b/modules/imgaug/tutorials/imgaug_obj_det/images/det_src.jpg new file mode 100644 index 00000000000..dbfdc107b85 Binary files /dev/null and b/modules/imgaug/tutorials/imgaug_obj_det/images/det_src.jpg differ diff --git a/modules/imgaug/tutorials/imgaug_obj_det/imgaug_obj_det.markdown b/modules/imgaug/tutorials/imgaug_obj_det/imgaug_obj_det.markdown new file mode 100644 index 00000000000..b5a1dd72eb6 --- /dev/null +++ b/modules/imgaug/tutorials/imgaug_obj_det/imgaug_obj_det.markdown @@ -0,0 +1,218 @@ +Data augmentation with imgaug in object detection {#tutorial_imgaug_object_detection} +============================== + +@tableofcontents + +@prev_tutorial{tutorial_imgaug_basic_usage} +@next_tutorial{tutorial_imgaug_pytorch} + +| | | +| -: | :- | +| Author | Chuyang Zhao | +| Compatibility | OpenCV >= 4.0 | + + +Introduction +------ +In the previous tutorial, we demonstrate how to use imgaug to perform transforms on pure images. +In some tasks, the inputs contains not only images but also the annotations. We extend the imgaug +module to support most of the main stream computer vision tasks. Here we demonstrate how to use imgaug for +object detection. + + +Goal +---- +In this tutorial, you will learn: +- How to use imgaug to perform data augmentation for data in object detection task + + +The inputs of object detection task contain source input image, the annotated bounding boxes, and the class labels +for each bounding box. In C++, the input image is represented as cv::Mat, the annotated bounding boxes can be represented +as `std::vector` in which each bounding box is represented as a cv::Rect. The annotated labels for objects in +bounding boxes can be represented as `std::vector`. + +The data augmentation methods for object detection are implemented under namespace cv::imgaug::det, you can +find more details of all implemented methods in documentation cv::imgaug::det. + + +Usage +----- +### Apply single data augmentation method +@add_toggle_cpp + +To use the imgaug module in object detection, we need to include the header file: + +@code{.cpp} +#include +@endcode + +Take random flip as an example, we first initialize the cv::imgaug::det::RandomRotation instance by: + +@code{.cpp} +imgaug::det::RandomRotation aug(Vec2d(-30, 30)); +@endcode + +The first argument cv::Vec2d(-30, 30) is the degree range in which the rotation degree will be uniformly sampled from. + +Then we read the source image and load its annotation data, which including bounding boxes and class labels. +In the following example, the annotation data contains two bounding boxes and two class labels: + +@code{.cpp} +Mat src = imread(samples::findFile("lena.jpg"), IMREAD_COLOR); + +std::vector bboxes{ +Rect{112, 40, 249, 343}, +Rect{61, 273, 113, 228} +}; + +std::vector labels {1, 2}; +@endcode + +The bounding boxes on the source image is as follows: + +![](images/det_src.jpg) + +Then we call random rotation operation on the given image and its annotations by imgaug::det::RandomRotation::call: + +@code{.cpp} +aug.call(src, dst, bboxes, labels); +@endcode + +The augmented image and its annotation are as follows: + +![](images/det_rotation_out.jpg) + +Complete code of this example: +@include imgaug/samples/det_sample.cpp. + +@end_toggle + +@add_toggle_python + +In Python, you should import the following package: + +@code{.py} +from cv2 import imgaug +@endcode + +Be aware the data augmentation methods for object detection are all in submodule `cv2.imgaug.det`. + +Take random flip as an example, we first initialize the cv::imgaug::det::RandomRotation instance by: + +@code{.py} +aug = imgaug.det.RandomRotation((-30, 30)) +@endcode + +The first argument (-30, 30) is the degree range in which the rotation degree will be uniformly sampled from. + +Then we read the source image and load its annotation data, which including bounding boxes and class labels. +In the following example, the annotation data contains two bounding boxes and two class labels: + +@code{.py} +src = cv2.imread("lena.jpg", cv2.IMREAD_COLOR) + +bboxes = [ + (112, 40, 249, 343), + (61, 273, 113, 228) +] + +labels = [1, 2] +@endcode + +@note We represent the bounding box with a four-elements tuple (x, y, w, h), +in which x and y are the coordinates of the top left corner of the bounding box, +w and h are the width and height of the bounding box. The binding generator will +convert the tuple into cv::Rect in C++. Please make sure the elements in the tuple +is in the right order. + +The bounding boxes on the source image is as follows: + +![](images/det_src.png) + +Then we call random rotation operation on the given image and its annotations by imgaug::det::RandomRotation::call: + +@code{.py} +dst = aug.call(src, bboxes, labels) +@endcode + +The augmented image and its annotation are as follows: + +![](images/det_rotation_out.png) + +Complete code of this example: +@include imgaug/samples/det_sample.cpp + +@end_toggle + +### Compose multiple data augmentation methods +@add_toggle_cpp +Compose multiple data augmentation methods into one in object detection module (cv::imgaug::det) is similar to basic imgaug module (cv::imgaug). +We also need to initialize multiple data augmentation instances in imgaug::det : + +@code{.cpp} +imgaug::det::RandomRotation randomRotation(Vec2d(-30, 30)); +imgaug::det::RandomFlip randomFlip(1); +imgaug::det::Resize resize(Size(224, 224)); +@endcode + +Different from data augmentation classes in cv::imgaug, data augmentation classes in cv::imgaug::det are inherited from base class +cv::imgaug::det::Transform, so we need to use pointer of type cv::imgaug::det::Transform to store the address of each data augmentation +instances in det module. We store their pointers in a vector and then initialize the imgaug::det::Compose class with this vector: + +@code{.cpp} +std::vector > transforms {&randomRotation, &randomFlip, &resize}; +imgaug::det::Compose aug(transforms); +@endcode + +@warning You cannot compose data augmentation methods in cv::imgaug::det module with methods in cv::imgaug module, +because they do not inherit from the same base class. You can only compose methods in the same module. + +Then we can call the compose method on the given image and its annotation as follows: + +@code{.cpp} +aug.call(src, dst, bboxes, labels); +@endcode + +The augmented image and its annotation are as follows: + +![](images/det_compose_out.png) + +Complete code of this example: +@include imgaug/samples/det_compose_sample.cpp + +@end_toggle + +@add_toggle_python +Compose multiple data augmentation methods into one in object detection module (cv::imgaug::det) is similar to basic imgaug module (cv::imgaug). +We also need to initialize multiple data augmentation instances in imgaug::det : + +@code{.py} +randomRotation = imgaug.det.RandomRotation((-30, 30)) +randomFlip = imgaug.det.RandomFlip(1) +resize = imgaug.det.Resize((224, 224)) +@endcode + +We save all these methods in a list `transforms` as parameter to initialize Compose class. + +@code{.py} +transforms = [randomRotation, randomFlip, resize] +aug = Compose(transforms) +@endcode + +@warning You cannot compose data augmentation methods in cv::imgaug::det module with methods in cv::imgaug module, +because they do not inherit from the same base class. You can only compose methods in the same module. + +Then we can call the compose method on the given image and its annotation as follows: + +@code{.py} +dst = aug.call(src, bboxes, labels) +@endcode + +The augmented image and its annotation are as follows: + +![](images/det_compose_out.png) + +Complete code of this example: +@include imgaug/samples/det_compose_sample.cpp + +@end_toggle \ No newline at end of file diff --git a/modules/imgaug/tutorials/imgaug_pytorch/imgaug_pytorch.markdown b/modules/imgaug/tutorials/imgaug_pytorch/imgaug_pytorch.markdown new file mode 100644 index 00000000000..1c65c36b13a --- /dev/null +++ b/modules/imgaug/tutorials/imgaug_pytorch/imgaug_pytorch.markdown @@ -0,0 +1,210 @@ +Use imgaug with PyTorch {#tutorial_imgaug_pytorch} +============================== + +@tableofcontents + +@prev_tutorial{tutorial_imgaug_object_detection} + +| | | +| -: | :- | +| Author | Chuyang Zhao | +| Compatibility | OpenCV >= 4.0 | + +Introduction +------------ +Imgaug is the data augmentation module in OpenCV which allows you to process +the data before putting them into the model. Because imgaug is implemented in +pure C++ and is backend with OpenCV's efficient image processing operations, +it runs faster and more efficiently than other existing Python-based +implementations. In this tutorial, I will demonstrate how to use imgaug +with PyTorch. Specifically, how to preprocess the data before putting +them into the PyTorch model for training or inference. + + +Goals +----- +In this tutorial, you will learn how to: +1. Use imgaug to perform data augmentation on your input data +2. Use imgaug with PyTorch for the image classification task +3. Use imgaug with PyTorch for the object detection task + + +Usage +----- +### Use imgaug with PyTorch in image classification task +In this section, we use Imagenette as the training dataset. You can download it [here](https://github.com/fastai/imagenette). + +Firstly, we define the dataset of PyTorch as follows: + +@code{.py} +class ImagenetteDataset(torch.utils.data.Dataset): + def __init__(self, root, df_data, mode='train', transform=None): + super(ImagenetteDataset, self).__init__() + assert mode in ['train', 'valid'] + + self.root = root + self.transform = transform + labels = ['n01440764', 'n02102040', 'n02979186', 'n03000684', 'n03028079', 'n03394916', 'n03417042', 'n03425413', 'n03445777', 'n03888257'] + self.label_to_num = {v: k for k, v in enumerate(labels)} + + if mode == 'train': + self.df_data = df_data[df_data['is_valid'] == False][:256] + else: + self.df_data = df_data[df_data['is_valid'] == True] + + def __len__(self): + return len(self.df_data) + + def __getitem__(self, idx): + path = self.df_data.iloc[idx]['path'] + path = os.path.join(self.root, path) + image = self.get_image(path) + label = path.split('/')[-2] + label = self.label_to_num[label] + return image, label + + def get_image(self, path): + image = cv2.imread(path) + if self.transform: + image = self.transform.call(image) + image = np.transpose(image, (2, 0, 1)) + return torch.tensor(image, dtype=torch.float) +@endcode + +In this dataset, we use `transforms` which we defined below to perform +data augmentation on the image. + +The transforms we used contain four data augmentation methods, and they are +composed into one using the cv::imgaug::Compose class. + +@code{.py} +transforms = cv2.Compose([ + cv2.RandomCrop((300, 300), (0,0,0,0)), + cv2.RandomFlip(), + cv2.Resize((500, 500)), + cv2.Normalize(mean=(0.406, 0.456, 0.485), std=(0.225, 0.224, 0.229)) +]) +@endcode + +@note The mean and std here we pass to cv.Normalize are [0.406, 0.456, 0.485] and [0.225, 0.224, 0.229] +respectively, which is slightly different from the mean and std of ImageNet (mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225]). +This is because the mean and std of ImageNet is for image in format RGB. But the image read by OpenCV is in BGR format. +So we need to change the order of the original mean and std of ImageNet to make it suitable for image read by OpenCV. + +After constructing the dataset and building the model, we can start training our model: + +@code{.py} +train_set = ImagenetteDataset(root_dir, df_train, 'train', transforms) + +train_loader = data.DataLoader(train_set, num_workers=0, batch_size=16, drop_last=True, shuffle=True) +model = resnet18(pretrained=True) +model.fc = torch.nn.Linear(in_features=512, out_features=10) +optimizer = torch.optim.Adam(model.parameters(), lr=lr) +criterion = torch.nn.CrossEntropyLoss() + +train(train_loader, model, 1, criterion, optimizer) +@endcode + +Complete code of the example is as follows: + +@include samples/train_cls_net.py + +### Use imgaug with PyTorch in object detection task + +In this section, we use Penn-Fudan dataset for training the object detection model. +You can download the dataset from [here](https://www.cis.upenn.edu/~jshi/ped_html/). + +Similarly, we first define the PyTorch dataset: +@code{.py} +class PennFudanDataset(torch.utils.data.Dataset): + def __init__(self, root, transforms=None): + self.root = root + self.transforms = transforms + # load all image files, sorting them to + # ensure that they are aligned + self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages")))) + self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks")))) + + def _get_boxes(self, mask): + obj_ids = np.unique(mask) + # first id is the background, so remove it + obj_ids = obj_ids[1:] + + # split the color-encoded mask into a set + # of binary masks + masks = mask == obj_ids[:, None, None] + + # get bounding box coordinates for each mask + num_objs = len(obj_ids) + for i in range(num_objs): + pos = np.where(masks[i]) + xmin = np.min(pos[1]) + xmax = np.max(pos[1]) + ymin = np.min(pos[0]) + ymax = np.max(pos[0]) + yield xmin, ymin, xmax, ymax + + def __getitem__(self, idx): + # load images and masks + img_path = os.path.join(self.root, "PNGImages", self.imgs[idx]) + mask_path = os.path.join(self.root, "PedMasks", self.masks[idx]) + img = cv2.imread(img_path) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + # mask is array of size (H, W), all elements of array are integers + # background is 0, and each distinct person is represented as a distinct integer starting from 1 + # you can treat mask as grayscale image + mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) + boxes = [] + for x1, y1, x2, y2 in self._get_boxes(mask): + # NOTE: in opencv, box is represented as (x, y, width, height) + boxes.append([x1, y1, x2-x1, y2-y1]) + num_objs = len(boxes) + labels = torch.ones((num_objs,), dtype=torch.int64) + + if self.transforms is not None: + img, boxes = self.transforms.call(img, boxes) + + # 1. transpose from (h, w, c) to (c, h, w) + # 2. normalize data into range 0-1 + # 3. convert from np.array to torch.tensor + img = torch.tensor(np.transpose(img, (2, 0, 1)), dtype=torch.float32) + boxes = [[x1, y1, x1+width, y1+height] for x1, y1, width, height in boxes] + boxes = torch.as_tensor(boxes, dtype=torch.float32) + + return img, boxes, labels + + def __len__(self): + return len(self.imgs) + + @staticmethod + def collate_fn(batch): + images = list() + boxes = list() + labels = list() + targets = list() + + for item in batch: + images.append(item[0]) + # boxes.append(item[1]) + # labels.append(item[2]) + target = {"boxes": item[1], "labels": item[2]} + targets.append(target) + + images = torch.stack(images, dim=0) + + return images, targets +@endcode + +Then we define the transforms we use for data augmentation as: +@code{.py} +def get_transforms(): + transforms = cv2.det.Compose([ + cv2.det.RandomFlip(), + cv2.det.Resize((500, 500)), + ]) + return transforms +@endcode + +Complete code the example is as follows: + +@include samples/train_det_net.py \ No newline at end of file diff --git a/modules/imgaug/tutorials/table_of_content_imgaug.markdown b/modules/imgaug/tutorials/table_of_content_imgaug.markdown new file mode 100644 index 00000000000..90b087937ee --- /dev/null +++ b/modules/imgaug/tutorials/table_of_content_imgaug.markdown @@ -0,0 +1,36 @@ +Tutorials for data augmentation module {#tutorial_table_of_content_imgaug} +=============================================================== + +Data augmentation techniques are widely used in deep learning training to expand +the training samples and overcome overfitting problem. imgaug module in OpenCV is +implemented in pure C++ and powered with efficient OpenCV image processing operations, +so it runs much faster and more efficient than Python-based implementations. + +With the binding generator provided by OpenCV, imgaug can be used not only from C++, but also from +different languages like Python, Java, etc. Conversely, you can also adopt your code +easily from other languages to C++, which is especially useful when you want to deploy +a model with its data preprocessing pipeline from Python to production environment in C++. + +- @subpage tutorial_imgaug_basic_usage + + *Compatibility:* >= OpenCV 4.0 + + *Author:* Chuyang Zhao + + Basic usage of imgaug module. Perform data augmentation on images. + +- @subpage tutorial_imgaug_object_detection + + *Compatibility:* >= OpenCV 4.0 + + *Author:* Chuyang Zhao + + Use imgaug to perform data augmentation for object detection task. + +- @subpage tutorial_imgaug_pytorch + + *Compatibility:* >= OpenCV 4.0 + + *Author:* Chuyang Zhao + + Use imgaug with PyTorch for different computer vision tasks. \ No newline at end of file