Table of Contents
Please note that you can choose to use the search API to replace this step, and replace the LLM call with search call.
All positions have been extracted, we can filter out all non -software engineering positions and save them into .csv files:
Home Backend Development Python Tutorial Search startup jobs with Python and LLMs

Search startup jobs with Python and LLMs

Jan 27, 2025 pm 08:15 PM

Search startup jobs with Python and LLMs

The job information released by many company websites can not always be found on the mainstream job search website. For example, finding a long -distance startup work may be challenging because these companies may not even list on the job website. To find these tasks, you need:

Find a company with potential
  • Search for their career page
  • Analyze the available position list
  • Manual record job details
  • This is very time -consuming, but we will automate it.

Preparation

We will use the Parsra library to automate the position. PARSERA provides two use options:

Local mode

: Use your choice LLM to handle the page on your machine;
  • API mode : All processing is performed on the PARSERA server.

  • In this example, we will use the local model because this is a one -time, small -scale extraction. First of all, install the required software package:

Since we are running the local settings, LLM connection is needed. For simplicity, we will use Openai's GPT-4O-MINI, and only need to set an environment variable:

After all settings are completed, we can start to capture.

1

2

<code>pip install parsera

playwright install</code>

Copy after login

Step 1: Get the list of the latest A round financing startup

1

2

3

4

5

6

7

<code>import os

from parsera import Parsera

 

os.environ["OPENAI_API_KEY"] = "<your_openai_api_key_here>"

 

scraper = Parsera(model=llm)

</your_openai_api_key_here></code>

Copy after login
First of all, we need to find the list of companies and websites we are interested in. I found a list of 100 startups that completed the A round of financing last month. Growth companies and new rounds of financing seem to be a good choice.

Let's get the countries and websites of these companies:

With national information, we can filter the country we are interested in. Let's narrow the search range to the United States:

Step 2: Find the career page

1

2

3

4

5

6

<code>url = "https://growthlist.co/series-a-startups/"

elements = {

    "Website": "公司的网站",

    "Country": "公司的国家",

}

all_startups = await scraper.arun(url=url, elements=elements)</code>

Copy after login

Now, we have a list of websites of Series A financing startups from the United States. The next step is to find their career page. We will extract the career page directly from their homepage:

1

2

3

<code>us_websites = [

    item["Website"] for item in all_startups if item["Country"] == "United States"

]</code>

Copy after login

Please note that you can choose to use the search API to replace this step, and replace the LLM call with search call.

Step 3: Grasp the open position

1

2

3

4

5

6

7

8

9

10

11

12

13

14

<code>from urllib.parse import urljoin

 

# 定义我们的目标

careers_target = {"url": "职业页面网址"}

 

careers_pages = []

for website in us_websites:

    website = "https://" + website

    result = await scraper.arun(url=website, elements=careers_target)

    if len(result) > 0:

        url = result[0]["url"]

        if url.startswith("/") or url.startswith("./"):

            url = urljoin(website, url)

        careers_pages.append(url)</code>

Copy after login
The last step is to load all open positions from the professional page of the website. Assuming that we are looking for software engineering positions, then we will find position names, locations, links, and whether it is related to software engineering:

All positions have been extracted, we can filter out all non -software engineering positions and save them into .csv files:

Finally, we get a table containing the position list, as shown below:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

<code>jobs_target = {

    "Title": "职位的名称",

    "Location": "职位的所在地",

    "Link": "职位发布的链接",

    "SE": "如果这是软件工程职位,则为True,否则为False",

}

 

jobs = []

for page in careers_pages:

    result = await scraper.arun(url=page, elements=jobs_target)

    if len(result) > 0:

        for row in result:

            row["url"] = page

            row["Link"] = urljoin(row["url"], row["Link"])

    jobs.extend(result)</code>

Copy after login
职位名称 所在地 链接 软件工程职位 网址
AI技术主管经理 班加罗尔 https://job-boards.greenhouse.io/enterpret/jobs/6286095003 True https://boards.greenhouse.io/enterpret/
后端开发人员 特拉维夫 https://www.upwind.io/careers/co/tel-aviv/BA.04A/backend-developer/all#jobs True https://www.upwind.io/careers
... ... ... ... ...
Conclusion ----------

Next, we can repeat the same process to extract more information from the full job list. For example, get the tech stack or filter for jobs at remote startups. This will save time manually reviewing all pages. You can try iterating the Link field yourself and extracting the elements you are interested in.

I hope you found this article helpful and please let me know if you have any questions.

The above is the detailed content of Search startup jobs with Python and LLMs. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1662
14
PHP Tutorial
1261
29
C# Tutorial
1234
24
Python vs. C  : Applications and Use Cases Compared Python vs. C : Applications and Use Cases Compared Apr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic Approach The 2-Hour Python Plan: A Realistic Approach Apr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Games, GUIs, and More Python: Games, GUIs, and More Apr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

How Much Python Can You Learn in 2 Hours? How Much Python Can You Learn in 2 Hours? Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

Python vs. C  : Learning Curves and Ease of Use Python vs. C : Learning Curves and Ease of Use Apr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

Python and Time: Making the Most of Your Study Time Python and Time: Making the Most of Your Study Time Apr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Exploring Its Primary Applications Python: Exploring Its Primary Applications Apr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

Python: Automation, Scripting, and Task Management Python: Automation, Scripting, and Task Management Apr 16, 2025 am 12:14 AM

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

See all articles