Startups

After AgentGPT’s success, Reworkd pivots to web-scraping AI agents

Comment

Image Credits: Reworkd

Reworkd’s founders went viral on GitHub last year with AgentGPT, a free tool to build AI agents that acquired more than 100,000 daily users in a week. This earned them a spot in Y Combinator’s summer 2023 cohort, but the co-founders quickly realized building general AI agents was too broad. So now Reworkd is a web-scraping company, specifically building AI agents to extract structured data from the public web.

AgentGPT provided a simple interface in a browser where users could create autonomous AI agents. Soon, everyone was raving about how agents were the future of computing.

When the tool took off, Asim Shrestha, Adam Watkins, and Srijan Subedi were still living in Canada and Reworkd didn’t exist. The massive user influx caught them off guard; Subedi, now Reworkd’s COO, said the tool was costing them $2,000 a day in API calls. For that reason, they had to create Reworkd and get funded fast. One of the most popular use cases for AgentGPT was creating web scrapers, a relatively simple but high-volume task, so Reworkd made this its singular focus.

Web scrapers have become invaluable in the AI era. The number one reason organizations use public web data in 2024 is to build AI models, according to Bright Data’s latest report. The problem is that web scrapers are traditionally built by humans and must be customized for specific web pages, making them expensive. But Reworkd’s AI agents can scrape more of the web with fewer humans in the loop.

Customers can give Reworkd a list of hundreds, or even thousands, of websites to scrape and then specify the types of data they’re interested in. Then Reworkd’s AI agents use multimodal code generation to turn this into structured data. Agents generate unique code to scrape each website and extract that data for customers to use as they please.

For example, say you want stats on every NFL player, but every team’s website has a different layout. Instead of building a scraper for each website, Reworkd’s agents do that for you given just links and a description of the data you want to extract. With 32 teams, that could save you hours — but if there were 1,000 teams, it could save you weeks.

Reworkd raised a fresh $2.75 million in seed funding from Paul Graham, AI Grant (Nat Friedman and Daniel Gross’ startup accelerator), SV Angel, General Catalyst and Panache Ventures, among others, the startup exclusively told TechCrunch. Combined with a $1.25 million pre-seed investment last year from Panache Ventures and Y Combinator, this brings Reworkd’s total funding raised to date to $4 million.

AI that can use the internet

Shortly after forming Reworkd and moving to San Francisco, the team hired Rohan Pandey as a founding research engineer. He currently lives in AGI House SF, one of the Bay Area’s most popular hacker houses for the AI era. One investor described Pandey as a “one person research lab within Reworkd.”

“We see ourselves as the culmination of this 30-year dream of the Semantic Web,” said Pandey in an interview with TechCrunch, referring to a vision of world wide web inventor Tim Berners-Lee in which computers can read the entire internet. “Even though some websites don’t have markup, LLMs can understand the websites in the same ways that humans can, in such that we can expose basically any website as an API. So in some sense, Reworkd is like the universal API layer for the internet.”

Reworkd says it’s able to capture the long tail end of customer data needs, meaning its AI agents are specifically good for scraping thousands of smaller public websites that large competitors often skip over. Others, such as Bright Data, have scrapers for large websites like LinkedIn or Amazon already built out, but it may not be worth the trouble for a human to build a scraper for every small website. Reworkd addresses this concern, but potentially raises others.

What exactly is “public” web data?

Though web scrapers have existed for decades, they have attracted controversy in the AI era. Unfettered scraping of huge swathes of data has thrown OpenAI and Perplexity into legal trouble: News and media organizations allege the AI companies extracted intellectual property from behind a paywall, reproducing it widely without payment. Reworkd is taking precautions to avoid these issues.

“We look at it as uplifting the accessibility of publicly available information,” said Shrestha, co-founder and CEO of Reworkd, in an interview with TechCrunch. “We’re only allowing information that’s publicly available; we’re not going through sign-in walls or anything like that.”

To go a step further, Reworkd says it’s avoiding scraping news altogether, and being selective about who they work with. Watkins, the company’s CTO, says there are better tools for aggregating news content elsewhere, and it is not their focus.

As an example of what is, Reworkd described their work with Axis, a company that helps policy teams comply with government regulations. Axis uses Reworkd’s AI to extract data from thousands of government regulation documents for many countries across the European Union. Axis then trains and fine-tunes an AI model based on this data and offers it to clients as a product.

Starting a web-scraping company these days could be considered wading into dangerous territory, according to Aaron Fiske, partner at Silicon-Valley based law firm Gunderson Dettmer. The landscape is somewhat fluid right now, and the jury is still out on how “public” web data really is for AI models. However, Fiske says Reworkd’s approach, where customers decide what websites to scrape, may insulate them from legal liability.

“It’s like they invented the copying machine, and there’s this one use case for making copies that turned out to be hugely economically valuable, but also legally, really questionable,” said Fiske in an interview with TechCrunch. “It’s not like web scrapers servicing AI companies is necessarily risky, but working with AI companies that are really interested in harvesting copyrighted content is maybe an issue.”

That’s why Reworkd is being careful about who it works with. Web scrapers have obfuscated much of the blame in potential copyright infringement cases related to AI thus far. In the OpenAI case, Fiske points out that The New York Times did not sue the web scraper that collected its articles, but rather the company that allegedly reproduced its work. But even there, it’s yet to be decided if what OpenAI did was truly copyright infringement.

There’s more evidence that web scrapers are legally in the clear during the AI boom. A court recently ruled in favor of Bright Data after it scraped Facebook and Instagram profiles via the web. One example in the court case was a dataset of 615 million records of Instagram user data, which Bright Data sells for $860,000. Meta sued the company, alleging this violated its terms of service. But a court ruled that this data is public and therefore available to scrape.

Investors think Reworkd scales with the big guys

Reworkd has attracted big names as early investors, from Y Combinator and Paul Graham to Daniel Gross and Nat Friedman. Some investors say this is because Reworkd’s technology stands to improve, and get cheaper, alongside new models. The startup says OpenAI’s GPT-4o is currently the best for its multimodal code generation and that a lot of Reworkd’s technology wasn’t possible until just a few months ago.

“If you try to compete with the rate of technology progress — not building on top of it — then I think that you’ll have a hard time as a founder,” General Catalyst’s Viet Le told TechCrunch. “Reworkd has the mindset of basing its solution on the rate of progress.”

Reworkd is creating AI agents that address a particular gap in the market; companies need more data because AI is advancing quickly. As more companies build custom AI models specific to their business, Reworkd stands to gain more customers. Fine-tuning models necessitates quality, structured data, and lots of it.

Reworkd says its approach is “self-healing,” meaning that its web scrapers won’t break down due to a web page update. The startup claims to avoid hallucination issues traditionally associated with AI models because Reworkd’s agents are generating code to scrape a website. It’s possible the AI could make a mistake and grab the wrong data from a website, but Reworkd’s team created Banana-lyzer, an open source evaluation framework, to regularly assess its accuracy.

Reworkd doesn’t have a large payroll — the team is just four people — but it does have to take on considerable inference costs for running its AI agents. The startup expects its pricing to get increasingly competitive as these costs trend downward. OpenAI just released GPT-4o mini, a smaller version of its industry-leading model with competitive benchmarks. Innovations like these could make Reworkd more competitive.

Paul Graham and AI Grant did not respond to TechCrunch’s request for comment.

More TechCrunch

Ola Electric, India’s largest electric two-wheeler maker, saw its shares rise as much as 20% on its public debut on Friday, making it the biggest listing among Indian firms in…

Ola Electric surges in India’s biggest listing in two years

Rocket Lab surpassed $100 million in quarterly revenue for the first time, a 71% increase from the same quarter of last year. This is just one of several shiny accomplishments…

Rocket Lab’s sunny outlook bodes well for future constellation plans 

In 1996, two companies, Patersons HR and Payroll Solutions, formed a venture called CloudPay to provide payroll and payments services to enterprise clients. CloudPay grew quietly over the next several…

CloudPay, a payroll services provider, lands $120M in new funding

The vulnerabilities allowed one security researcher to peek inside the leak sites without having to log in.

Security bugs in ransomware leak sites helped save six companies from paying hefty ransoms

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

A comprehensive list of 2024 tech layoffs

A new “beta rabbit” mode adds some conversational AI chops to the Rabbit r1, particularly in more complex or multi-step instructions.

Rabbit’s r1 refines chats and timers, but its app-using ‘action model’ is still MIA

Los Angeles is notorious for its back-to-back traffic. Three events that promise to bring in millions of spectators from around the world — the 2026 World Cup, the Super Bowl…

Archer to set up air taxi network in LA by 2026 ahead of World Cup

Featured Article

Amazon is fumbling in India

Amazon’s decision to overlook quick-commerce in India is now looking like a significant misstep.

Amazon is fumbling in India

OpenAI’s GPT-4o, the generative AI model that powers the recently launched alpha of Advanced Voice Mode in ChatGPT, is the company’s first trained on voice as well as text and…

OpenAI finds that GPT-4o does some truly bizarre stuff sometimes

On Thursday, Box filled in a missing piece on its AI platform when it bought automated metadata extracting startup, Alphamoon.

Box adds crucial piece to its AI platform with Alphamoon acquisition

OpenAI has announced a new appointment to its board of directors: Zico Kolter. Kolter, a professor and director of the machine learning department at Carnegie Mellon, predominantly focuses his research…

OpenAI adds a Carnegie Mellon professor to its board of directors

Count Spotify and Epic Games among the Apple critics who are not happy with the iPhone maker’s newly revised compliance plan for the European Union’s Digital Markets Act (DMA). Shortly…

Spotify and Epic Games call Apple’s revised DMA compliance plan ‘confusing,’ ‘illegal’ and ‘unacceptable’

Thursday seeks to shake up conventional online dating in a crowded market. The app, which recently expanded to San Francisco, fosters intentional dating by restricting user access to Thursdays. At…

Thursday, the dating app that you can use only on Thursdays, expands to San Francisco

AI companies are gobbling up investor money and securing sky-high valuations early in their life cycle. This dynamic has many calling the AI industry a bubble. Nick Frosst, a co-founder…

Cohere co-founder Nick Frosst thinks everyone needs to be more realistic about what AI can and cannot do

Instagram is rolling out the ability for users to add up to 20 photos or videos to their feed carousels, as the platform embraces the trend of “photo dumps.” Back…

Instagram is embracing the ‘photo dump’

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Anyone paying…

Lyft ‘opens a can of whoop ass’ on surge pricing, Tesla’s Dojo explained and Saudi Arabia pumps $1.5B into Lucid

Flint Capital just closed its third fund at $160 million. Its has a unique strategy for finding its limited partner investors. 

Flint Capital raises a $160M through an unusual fund-raising strategy

Earlier this week it emerged that the DPC had instigated court proceedings seeking an injunction against X over the data processing without consent.

Elon Musk’s X agrees to pause EU data processing for training Grok

During testing, Google DeepMind’s table tennis bot was able to beat all of the beginner-level players it faced.

Google DeepMind develops a ‘solidly amateur’ table tennis robot

The X account announced that its Premium+ subscription would now be “fully” ad-free, leading some to question how this change would affect creator earnings.

As X sues advertisers over boycott, the app ditches all ads from its top subscription tier

Apple has further revised its compliance plan for the European Union’s Digital Markets Act (DMA) rulebook, which, since March, has forced it to give iOS developers more freedom over how…

Apple revises DMA compliance for App Store link-outs, applying fewer restrictions and a new fee structure

The rise of neobanks has been fascinating to witness, as a number of companies in recent years have grown from merely challenging traditional banks to being massive players in and…

Chime and Dave execs are coming to TechCrunch Disrupt 2024

If you visited the Wikipedia website on mobile this week, you might have seen a pop-up indicating that dark mode is ready for prime time.

How to enable Wikipedia’s dark mode

The home security company says attackers accessed databases containing customer home addresses, email addresses, and phone numbers.

Home security giant ADT says it was hacked

The Looking Glass Pro has a 6-inch display and a foldable base. It shows spatial images like those created with the Apple Vision Pro and iPhone 15 Pro.

Looking Glass’ new lineup includes a $300 phone-sized holographic display

TikTok’s latest offering is capitalizing on the app’s ability to serve as a discovery engine for other media — something its users already take advantage of by sharing short clips…

TikTok partners with Warner Bros. to become a discovery engine for TV and movies

Cocoon is a new startup built on the belief that greener steel production and the creation of concrete slag doesn’t have to be an either/or proposition.

Cocoon is transforming steel production runoff into a greener cement alternative

SoundHound, an AI company that makes voice interface tech used by car companies, restaurants and tech firms, is doubling down on enterprise services by playing consolidator in a crowded market.…

SoundHound acquires Amelia AI for $80M after it raised $189M+

Seeking mental health support is a complex process, but some founders believe that using AI to formalize techniques like cognitive behavioral therapy (CBT) can help folks who might not have…

Feeling Great’s new therapy app translates its psychiatrist co-founder’s experience into AI

The U.K.’s antitrust regulator has confirmed that it’s carrying out a formal antitrust investigation into Amazon’s ties with Anthropic, after Amazon recently completed a $4 billion investment into the AI startup.…

UK launches formal probe into Amazon’s ties with AI startup Anthropic