Skip to content

Jenkins for Pipelines

vjrj edited this page Jun 6, 2022 · 14 revisions

Intro

pipelines playbook includes a role to install configure and use jenkins for pipelines data processing jobs.

Additional steps

Extra plugins requirements

Follow the role README to install some jenkins plugins dependencies.

Script approvals

Some of these preconfigured jobs will need some extra script approval during their first Run:

Navigate to jenkins > Manage jenkins > In-process Script Approval

Verify master label

Navigate to jenkins > Manage jenkins > Configure > Labels and verify that your jenkins has a correct Label that match your spark and pipelines node name, and master.

Add your SSH credential configured in the pipelines jobs

Manage Jenkins > Manage credentials > Jenkins (global) > Global credantials > Add Credential

Add your jenkins slaves nodes

Migrate uuid job

You will need several things to run this job. In your production cassandra:

  1. Create this directory
# mkdir /data/uuid-exports/
# chown someuser:someuser /data/uuid-exports/ # optionally
  1. copy in /data/uuid-exports/uuid-export.sh this script and give execution permissions.
  2. Be sure that you can connect ssh passwordless from your pipelines jenkins to your cassandra, from the spark user to someuser in cassandra that can run the previous script.
  3. Adapt the migration-uuid job to fit to your infrastructure and users

Datasets copy

Copy some of your datasets to test data/dwca-export like this:

/data/dwca-export
    ├── dr289
    │   └── dr289.zip
    ├── dr490
    │   └── dr490.zip
    ├── dr603
    │   └── dr603.zip
    
    (...)
    
    └── dr879
        └── dr879.zip

to start with smoke tests, please try with small datasets (< 300,000).

Clone this wiki locally