Skip to content

Tecosys/Linkdin-Scrapper

Repository files navigation

LinkedIn Scraper API

A FastAPI-based service for scraping LinkedIn profiles using Apify actors. This API allows you to search for LinkedIn profiles based on keywords or specific URLs and extract relevant profile information.

Features

  • Search LinkedIn profiles using keywords and location filters
  • Scrape detailed profile information using specific LinkedIn profile URLs
  • Combined search-and-scrape functionality to find and extract detailed profile data
  • Batch processing of lead generation requests
  • Clean, structured output of profile information

Requirements

  • Python 3.8+
  • FastAPI
  • Uvicorn
  • Apify Client
  • Pydantic
  • MongoDB

Installation

Standard Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/Linkdin-Scrapper.git
    cd Linkdin-Scrapper
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Create a .env file in the root directory and add your Apify API key:

    DEFAULT_APIFY_API_KEY=your_apify_key_here
    MONGODB_URL=mongodb://localhost:27017
    MONGODB_DB_NAME=linkedin_scraper
    

Docker Installation

  1. Build the Docker image:

    docker build -t linkedin-scraper .
    
  2. Run the container:

    docker run -d -p 8007:8007 --env DEFAULT_APIFY_API_KEY=your_apify_key_here --env MONGODB_URL=mongodb://mongo:27017 --name linkedin-scraper linkedin-scraper
    

    Alternatively, you can use a .env file:

    docker run -d -p 8007:8007 --env-file .env --name linkedin-scraper linkedin-scraper
    

Usage

Standard Usage

Start the API server:

python run.py

Docker Usage

The API will start automatically when you run the container.

The API will be available at http://localhost:8007 by default.

API Endpoints

1. Scrape by Keywords

Endpoint: POST /api/v1/scrape

Request body:

{
  "keywords": ["software engineer", "python"],
  "limit": 10,
  "location": ["New York", "Remote"],
  "apify_api_key": "optional_custom_key"
}

Response: List of LinkedIn profiles matching the keywords

2. Scrape by URLs

Endpoint: POST /api/v1/scrape-by-url

Request body:

{
  "profileUrls": ["https://www.linkedin.com/in/username1", "https://www.linkedin.com/in/username2"],
  "apify_api_key": "optional_custom_key"
}

Response: Detailed information for the specified LinkedIn profiles

3. Complete Scrape (Combined Search and Profile Extraction)

Endpoint: POST /api/v1/complete-scrape

Request body:

{
  "keywords": ["software engineer", "python"],
  "limit": 10,
  "location": ["New York", "Remote"],
  "apify_api_key": "optional_custom_key"
}

Response: Simplified profile information including:

  • First name
  • Last name
  • Location
  • Email
  • Company
  • Position
  • Mobile number
  • Company size
  • Profile URL

4. Lead Generation

4.1. Start Lead Generation

Endpoint: POST /api/v1/leadgenerate/{your-custom-id}

For example: POST /api/v1/leadgenerate/skhdsdysvds-dsdsdsds

Request body:

{
  "limit": 1000,
  "emailOnly": true,
  "keywords": ["software engineer", "python", "data scientist"],
  "location": ["New York", "Remote"],
  "apify_api_key": "optional_custom_key"
}

Response:

{
  "message": "Lead generation started",
  "processingId": "skhdsdysvds-dsdsdsds"
}

This endpoint starts an asynchronous process to generate leads:

  • The custom ID is provided directly in the URL path
  • It processes leads in batches of 100 until the limit is reached
  • If emailOnly is true, it filters to include only users with emails
  • keywords can be provided to specify what type of profiles to search for (defaults to a set of common job titles)
  • location can be provided to filter profiles by location
  • apify_api_key can be provided to use a specific Apify API key for this operation
  • All data is saved into MongoDB under the provided custom ID
  • Returns the same custom ID as processingId for tracking the operation

4.2. Check Lead Generation Status

Endpoint: GET /api/v1/leadgenerate/{processing_id}/status

For example: GET /api/v1/leadgenerate/skhdsdysvds-dsdsdsds/status

Response:

{
  "processingId": "skhdsdysvds-dsdsdsds",
  "status": "completed",
  "startedAt": "2023-07-15T10:30:00.000Z",
  "completedAt": "2023-07-15T10:35:45.000Z",
  "processedCount": 850,
  "totalCount": 850,
  "limit": 1000,
  "emailOnly": true,
  "keywords": ["software engineer", "python", "data scientist"],
  "location": ["New York", "Remote"]
}

This endpoint returns the current status of a lead generation process:

  • Possible status values: "processing", "completed", "failed"
  • Includes start and completion timestamps
  • Shows how many leads have been processed so far
  • Returns the original request parameters

4.3. Get Lead Generation Results

Endpoint: GET /api/v1/leadgenerate/{processing_id}/results

For example: GET /api/v1/leadgenerate/skhdsdysvds-dsdsdsds/results?page=1&page_size=50

Query Parameters:

  • page: Page number (default: 1)
  • page_size: Number of results per page (default: 50, max: 100)

Response:

{
  "processingId": "skhdsdysvds-dsdsdsds",
  "status": "completed",
  "pagination": {
    "page": 1,
    "pageSize": 50,
    "totalItems": 850,
    "totalPages": 17
  },
  "leads": [
    {
      "firstName": "John",
      "lastName": "Doe",
      "location": "New York, NY",
      "email": "john.doe@example.com",
      "company": "Tech Company",
      "position": "Software Engineer",
      "mobileNumber": "+1234567890",
      "companySize": "1001-5000",
      "profileUrl": "https://www.linkedin.com/in/johndoe"
    },
    // ... more leads
  ]
}

This endpoint returns the generated leads with pagination:

  • Includes the current status of the process
  • Provides pagination information (current page, total pages, etc.)
  • Returns a list of leads with all their details

4.4. Delete Lead Generation Data

Endpoint: DELETE /api/v1/leadgenerate/{processing_id}

For example: DELETE /api/v1/leadgenerate/skhdsdysvds-dsdsdsds

Response:

{
  "processingId": "skhdsdysvds-dsdsdsds",
  "processDeleted": true,
  "leadsDeleted": 850
}

This endpoint deletes a lead generation process and all its generated leads:

  • Removes the process document from the database
  • Removes all leads associated with the process
  • Returns the number of deleted items

Notes

  • This service requires an Apify API key to work properly
  • MongoDB is required for the lead generation feature
  • The number of profiles you can scrape may be limited by your Apify subscription
  • Please use responsibly and in accordance with LinkedIn's terms of service

About

Linkdin-Scrapper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published