Home > Backend Development > Python Tutorial > Search startup jobs with Python and LLMs

Search startup jobs with Python and LLMs

DDD
Release: 2025-01-27 20:15:13
Original
263 people have browsed it

Search startup jobs with Python and LLMs

The job information released by many company websites can not always be found on the mainstream job search website. For example, finding a long -distance startup work may be challenging because these companies may not even list on the job website. To find these tasks, you need:

Find a company with potential
  • Search for their career page
  • Analyze the available position list
  • Manual record job details
  • This is very time -consuming, but we will automate it.

Preparation

We will use the Parsra library to automate the position. PARSERA provides two use options:

Local mode

: Use your choice LLM to handle the page on your machine;
  • API mode : All processing is performed on the PARSERA server.

  • In this example, we will use the local model because this is a one -time, small -scale extraction. First of all, install the required software package:

Since we are running the local settings, LLM connection is needed. For simplicity, we will use Openai's GPT-4O-MINI, and only need to set an environment variable:

After all settings are completed, we can start to capture.
<code>pip install parsera
playwright install</code>
Copy after login

Step 1: Get the list of the latest A round financing startup

<code>import os
from parsera import Parsera

os.environ["OPENAI_API_KEY"] = "<your_openai_api_key_here>"

scraper = Parsera(model=llm)
</your_openai_api_key_here></code>
Copy after login
First of all, we need to find the list of companies and websites we are interested in. I found a list of 100 startups that completed the A round of financing last month. Growth companies and new rounds of financing seem to be a good choice.

Let's get the countries and websites of these companies:

With national information, we can filter the country we are interested in. Let's narrow the search range to the United States:

Step 2: Find the career page
<code>url = "https://growthlist.co/series-a-startups/"
elements = {
    "Website": "公司的网站",
    "Country": "公司的国家",
}
all_startups = await scraper.arun(url=url, elements=elements)</code>
Copy after login

Now, we have a list of websites of Series A financing startups from the United States. The next step is to find their career page. We will extract the career page directly from their homepage:

<code>us_websites = [
    item["Website"] for item in all_startups if item["Country"] == "United States"
]</code>
Copy after login

Please note that you can choose to use the search API to replace this step, and replace the LLM call with search call.

Step 3: Grasp the open position

<code>from urllib.parse import urljoin

# 定义我们的目标
careers_target = {"url": "职业页面网址"}

careers_pages = []
for website in us_websites:
    website = "https://" + website
    result = await scraper.arun(url=website, elements=careers_target)
    if len(result) > 0:
        url = result[0]["url"]
        if url.startswith("/") or url.startswith("./"):
            url = urljoin(website, url)
        careers_pages.append(url)</code>
Copy after login
The last step is to load all open positions from the professional page of the website. Assuming that we are looking for software engineering positions, then we will find position names, locations, links, and whether it is related to software engineering:

All positions have been extracted, we can filter out all non -software engineering positions and save them into .csv files:

Finally, we get a table containing the position list, as shown below:
<code>jobs_target = {
    "Title": "职位的名称",
    "Location": "职位的所在地",
    "Link": "职位发布的链接",
    "SE": "如果这是软件工程职位,则为True,否则为False",
}

jobs = []
for page in careers_pages:
    result = await scraper.arun(url=page, elements=jobs_target)
    if len(result) > 0:
        for row in result:
            row["url"] = page
            row["Link"] = urljoin(row["url"], row["Link"])
    jobs.extend(result)</code>
Copy after login
职位名称 所在地 链接 软件工程职位 网址
AI技术主管经理 班加罗尔 https://job-boards.greenhouse.io/enterpret/jobs/6286095003 True https://boards.greenhouse.io/enterpret/
后端开发人员 特拉维夫 https://www.upwind.io/careers/co/tel-aviv/BA.04A/backend-developer/all#jobs True https://www.upwind.io/careers
... ... ... ... ...
Conclusion ----------

Next, we can repeat the same process to extract more information from the full job list. For example, get the tech stack or filter for jobs at remote startups. This will save time manually reviewing all pages. You can try iterating the Link field yourself and extracting the elements you are interested in.

I hope you found this article helpful and please let me know if you have any questions.

The above is the detailed content of Search startup jobs with Python and LLMs. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template