LinkedIn Scraper API

A FastAPI-based service for scraping LinkedIn profiles using Apify actors. This API allows you to search for LinkedIn profiles based on keywords or specific URLs and extract relevant profile information.

Features

Search LinkedIn profiles using keywords and location filters
Scrape detailed profile information using specific LinkedIn profile URLs
Combined search-and-scrape functionality to find and extract detailed profile data
Batch processing of lead generation requests
Clean, structured output of profile information

Requirements

Python 3.8+
FastAPI
Uvicorn
Apify Client
Pydantic
MongoDB

Installation

Standard Installation

Clone the repository:

git clone https://github.com/yourusername/Linkdin-Scrapper.git
cd Linkdin-Scrapper

Install dependencies:
```
pip install -r requirements.txt
```

Create a .env file in the root directory and add your Apify API key:

DEFAULT_APIFY_API_KEY=your_apify_key_here
MONGODB_URL=mongodb://localhost:27017
MONGODB_DB_NAME=linkedin_scraper

Docker Installation

Build the Docker image:
```
docker build -t linkedin-scraper .
```

Run the container:

docker run -d -p 8007:8007 --env DEFAULT_APIFY_API_KEY=your_apify_key_here --env MONGODB_URL=mongodb://mongo:27017 --name linkedin-scraper linkedin-scraper

Alternatively, you can use a .env file:

docker run -d -p 8007:8007 --env-file .env --name linkedin-scraper linkedin-scraper

Usage

Standard Usage

Start the API server:

python run.py

Docker Usage

The API will start automatically when you run the container.

The API will be available at http://localhost:8007 by default.

API Endpoints

1. Scrape by Keywords

Endpoint: POST /api/v1/scrape

Request body:

{
  "keywords": ["software engineer", "python"],
  "limit": 10,
  "location": ["New York", "Remote"],
  "apify_api_key": "optional_custom_key"
}

Response: List of LinkedIn profiles matching the keywords

2. Scrape by URLs

Endpoint: POST /api/v1/scrape-by-url

Request body:

{
  "profileUrls": ["https://www.linkedin.com/in/username1", "https://www.linkedin.com/in/username2"],
  "apify_api_key": "optional_custom_key"
}

Response: Detailed information for the specified LinkedIn profiles

3. Complete Scrape (Combined Search and Profile Extraction)

Endpoint: POST /api/v1/complete-scrape

Request body:

{
  "keywords": ["software engineer", "python"],
  "limit": 10,
  "location": ["New York", "Remote"],
  "apify_api_key": "optional_custom_key"
}

Response: Simplified profile information including:

First name
Last name
Location
Email
Company
Position
Mobile number
Company size
Profile URL

4. Lead Generation

4.1. Start Lead Generation

Endpoint: POST /api/v1/leadgenerate/{your-custom-id}

For example: POST /api/v1/leadgenerate/skhdsdysvds-dsdsdsds

Request body:

{
  "limit": 1000,
  "emailOnly": true,
  "keywords": ["software engineer", "python", "data scientist"],
  "location": ["New York", "Remote"],
  "apify_api_key": "optional_custom_key"
}

Response:

{
  "message": "Lead generation started",
  "processingId": "skhdsdysvds-dsdsdsds"
}

This endpoint starts an asynchronous process to generate leads:

The custom ID is provided directly in the URL path
It processes leads in batches of 100 until the limit is reached
If emailOnly is true, it filters to include only users with emails
keywords can be provided to specify what type of profiles to search for (defaults to a set of common job titles)
location can be provided to filter profiles by location
apify_api_key can be provided to use a specific Apify API key for this operation
All data is saved into MongoDB under the provided custom ID
Returns the same custom ID as processingId for tracking the operation

4.2. Check Lead Generation Status

Endpoint: GET /api/v1/leadgenerate/{processing_id}/status

For example: GET /api/v1/leadgenerate/skhdsdysvds-dsdsdsds/status

Response:

{
  "processingId": "skhdsdysvds-dsdsdsds",
  "status": "completed",
  "startedAt": "2023-07-15T10:30:00.000Z",
  "completedAt": "2023-07-15T10:35:45.000Z",
  "processedCount": 850,
  "totalCount": 850,
  "limit": 1000,
  "emailOnly": true,
  "keywords": ["software engineer", "python", "data scientist"],
  "location": ["New York", "Remote"]
}

This endpoint returns the current status of a lead generation process:

Possible status values: "processing", "completed", "failed"
Includes start and completion timestamps
Shows how many leads have been processed so far
Returns the original request parameters

4.3. Get Lead Generation Results

Endpoint: GET /api/v1/leadgenerate/{processing_id}/results

For example: GET /api/v1/leadgenerate/skhdsdysvds-dsdsdsds/results?page=1&page_size=50

Query Parameters:

page: Page number (default: 1)
page_size: Number of results per page (default: 50, max: 100)

Response:

{
  "processingId": "skhdsdysvds-dsdsdsds",
  "status": "completed",
  "pagination": {
    "page": 1,
    "pageSize": 50,
    "totalItems": 850,
    "totalPages": 17
  },
  "leads": [
    {
      "firstName": "John",
      "lastName": "Doe",
      "location": "New York, NY",
      "email": "john.doe@example.com",
      "company": "Tech Company",
      "position": "Software Engineer",
      "mobileNumber": "+1234567890",
      "companySize": "1001-5000",
      "profileUrl": "https://www.linkedin.com/in/johndoe"
    },
    // ... more leads
  ]
}

This endpoint returns the generated leads with pagination:

Includes the current status of the process
Provides pagination information (current page, total pages, etc.)
Returns a list of leads with all their details

4.4. Delete Lead Generation Data

Endpoint: DELETE /api/v1/leadgenerate/{processing_id}

For example: DELETE /api/v1/leadgenerate/skhdsdysvds-dsdsdsds

Response:

{
  "processingId": "skhdsdysvds-dsdsdsds",
  "processDeleted": true,
  "leadsDeleted": 850
}

This endpoint deletes a lead generation process and all its generated leads:

Removes the process document from the database
Removes all leads associated with the process
Returns the number of deleted items

Notes

This service requires an Apify API key to work properly
MongoDB is required for the lead generation feature
The number of profiles you can scrape may be limited by your Apify subscription
Please use responsibly and in accordance with LinkedIn's terms of service

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
app		app
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
filtered_leads.json		filtered_leads.json
jobgen.py		jobgen.py
lead_generator.py		lead_generator.py
main.py		main.py
pyproject.toml		pyproject.toml
rand.py		rand.py
requirements.txt		requirements.txt
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LinkedIn Scraper API

Features

Requirements

Installation

Standard Installation

Docker Installation

Usage

Standard Usage

Docker Usage

API Endpoints

1. Scrape by Keywords

2. Scrape by URLs

3. Complete Scrape (Combined Search and Profile Extraction)

4. Lead Generation

4.1. Start Lead Generation

4.2. Check Lead Generation Status

4.3. Get Lead Generation Results

4.4. Delete Lead Generation Data

Notes

About

Uh oh!

Releases

Packages

Languages

Tecosys/Linkdin-Scrapper

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Scraper API

Features

Requirements

Installation

Standard Installation

Docker Installation

Usage

Standard Usage

Docker Usage

API Endpoints

1. Scrape by Keywords

2. Scrape by URLs

3. Complete Scrape (Combined Search and Profile Extraction)

4. Lead Generation

4.1. Start Lead Generation

4.2. Check Lead Generation Status

4.3. Get Lead Generation Results

4.4. Delete Lead Generation Data

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages