A FastAPI-based service for scraping LinkedIn profiles using Apify actors. This API allows you to search for LinkedIn profiles based on keywords or specific URLs and extract relevant profile information.
- Search LinkedIn profiles using keywords and location filters
- Scrape detailed profile information using specific LinkedIn profile URLs
- Combined search-and-scrape functionality to find and extract detailed profile data
- Batch processing of lead generation requests
- Clean, structured output of profile information
- Python 3.8+
- FastAPI
- Uvicorn
- Apify Client
- Pydantic
- MongoDB
-
Clone the repository:
git clone https://github.com/yourusername/Linkdin-Scrapper.git cd Linkdin-Scrapper
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the root directory and add your Apify API key:DEFAULT_APIFY_API_KEY=your_apify_key_here MONGODB_URL=mongodb://localhost:27017 MONGODB_DB_NAME=linkedin_scraper
-
Build the Docker image:
docker build -t linkedin-scraper .
-
Run the container:
docker run -d -p 8007:8007 --env DEFAULT_APIFY_API_KEY=your_apify_key_here --env MONGODB_URL=mongodb://mongo:27017 --name linkedin-scraper linkedin-scraper
Alternatively, you can use a .env file:
docker run -d -p 8007:8007 --env-file .env --name linkedin-scraper linkedin-scraper
Start the API server:
python run.py
The API will start automatically when you run the container.
The API will be available at http://localhost:8007
by default.
Endpoint: POST /api/v1/scrape
Request body:
{
"keywords": ["software engineer", "python"],
"limit": 10,
"location": ["New York", "Remote"],
"apify_api_key": "optional_custom_key"
}
Response: List of LinkedIn profiles matching the keywords
Endpoint: POST /api/v1/scrape-by-url
Request body:
{
"profileUrls": ["https://www.linkedin.com/in/username1", "https://www.linkedin.com/in/username2"],
"apify_api_key": "optional_custom_key"
}
Response: Detailed information for the specified LinkedIn profiles
Endpoint: POST /api/v1/complete-scrape
Request body:
{
"keywords": ["software engineer", "python"],
"limit": 10,
"location": ["New York", "Remote"],
"apify_api_key": "optional_custom_key"
}
Response: Simplified profile information including:
- First name
- Last name
- Location
- Company
- Position
- Mobile number
- Company size
- Profile URL
Endpoint: POST /api/v1/leadgenerate/{your-custom-id}
For example: POST /api/v1/leadgenerate/skhdsdysvds-dsdsdsds
Request body:
{
"limit": 1000,
"emailOnly": true,
"keywords": ["software engineer", "python", "data scientist"],
"location": ["New York", "Remote"],
"apify_api_key": "optional_custom_key"
}
Response:
{
"message": "Lead generation started",
"processingId": "skhdsdysvds-dsdsdsds"
}
This endpoint starts an asynchronous process to generate leads:
- The custom ID is provided directly in the URL path
- It processes leads in batches of 100 until the limit is reached
- If
emailOnly
istrue
, it filters to include only users with emails keywords
can be provided to specify what type of profiles to search for (defaults to a set of common job titles)location
can be provided to filter profiles by locationapify_api_key
can be provided to use a specific Apify API key for this operation- All data is saved into MongoDB under the provided custom ID
- Returns the same custom ID as processingId for tracking the operation
Endpoint: GET /api/v1/leadgenerate/{processing_id}/status
For example: GET /api/v1/leadgenerate/skhdsdysvds-dsdsdsds/status
Response:
{
"processingId": "skhdsdysvds-dsdsdsds",
"status": "completed",
"startedAt": "2023-07-15T10:30:00.000Z",
"completedAt": "2023-07-15T10:35:45.000Z",
"processedCount": 850,
"totalCount": 850,
"limit": 1000,
"emailOnly": true,
"keywords": ["software engineer", "python", "data scientist"],
"location": ["New York", "Remote"]
}
This endpoint returns the current status of a lead generation process:
- Possible status values: "processing", "completed", "failed"
- Includes start and completion timestamps
- Shows how many leads have been processed so far
- Returns the original request parameters
Endpoint: GET /api/v1/leadgenerate/{processing_id}/results
For example: GET /api/v1/leadgenerate/skhdsdysvds-dsdsdsds/results?page=1&page_size=50
Query Parameters:
page
: Page number (default: 1)page_size
: Number of results per page (default: 50, max: 100)
Response:
{
"processingId": "skhdsdysvds-dsdsdsds",
"status": "completed",
"pagination": {
"page": 1,
"pageSize": 50,
"totalItems": 850,
"totalPages": 17
},
"leads": [
{
"firstName": "John",
"lastName": "Doe",
"location": "New York, NY",
"email": "john.doe@example.com",
"company": "Tech Company",
"position": "Software Engineer",
"mobileNumber": "+1234567890",
"companySize": "1001-5000",
"profileUrl": "https://www.linkedin.com/in/johndoe"
},
// ... more leads
]
}
This endpoint returns the generated leads with pagination:
- Includes the current status of the process
- Provides pagination information (current page, total pages, etc.)
- Returns a list of leads with all their details
Endpoint: DELETE /api/v1/leadgenerate/{processing_id}
For example: DELETE /api/v1/leadgenerate/skhdsdysvds-dsdsdsds
Response:
{
"processingId": "skhdsdysvds-dsdsdsds",
"processDeleted": true,
"leadsDeleted": 850
}
This endpoint deletes a lead generation process and all its generated leads:
- Removes the process document from the database
- Removes all leads associated with the process
- Returns the number of deleted items
- This service requires an Apify API key to work properly
- MongoDB is required for the lead generation feature
- The number of profiles you can scrape may be limited by your Apify subscription
- Please use responsibly and in accordance with LinkedIn's terms of service