What tools you use for web scraping?


(CGMS) #1

The title is self-explanatory.
I use Python with requests and BeautifulSoup.
(Writing this post I saw that there is a tutorial on how to use this tools :slight_smile:)

(oaktree) #2

open-uri and nokogiri. Ruby

libcurl and std::string::find in C++.

(Command-Line Ninja) #3

mechanize and nokogiri. Ruby is bae. Although.

requests in Python is pretty nice and light. Never used beautifulsoup in python before.


As you said, pretty nice, light and easy to use

(ouroborus) #5

also py’s lxml.html together with cssselect is pretty neat if you rely more on css selectors; personally, I prefer it over bs4

(mad scientist and king skid) #6

In addition to that there is still scrapy as well, depending on what you want to achieve BS4 might not be suitable anymore.

Played around with BS4 and scrapy a bit and on first glance scrapy looked a tad more powerful to me

(Joshua Jensch) #7

Anybody tried the HtmlProvider in Fsharp.Data?
I recently read a blogpost claiming f# to be the best language for scraping the interwebs.

(Full Snack Developer) #8

requests + BeautifulSoup is the bomb


Python’s BeautifulSoup and requests for smaller projects and scrapy when I need a powerful library for a project

(Temiloluwa) #10

Requests and beautifulsoup make it goes boom!!

(John) #11

I have scraped most sites with bs4, request, sqlite/sqlalchemy (with mySQL backend).
I did it multi-threaded some times too.

I can’t say that writing the code was pleasant, but the results were really efficient!

(Command-Line Ninja) #13


I used to be a python-requests fan until I stumbled upon python-lxml. Just wanted to give some more attention to your answer because it is, to my knowledge, pretty uncommon.

(Command-Line Ninja) #15

This topic was automatically closed after 34 hours. New replies are no longer allowed.