What tools you use for web scraping?


I use Python with requests and BeautifulSoup.
(Writing this post I saw that there is a tutorial on how to use this tools :slight_smile:)

open-uri and nokogiri. Ruby

libcurl and std::string::find in C++.

mechanize and nokogiri. Ruby is bae. Although.

requests in Python is pretty nice and light. Never used beautifulsoup in python before.


As you said, pretty nice, light and easy to use

also py’s lxml.html together with cssselect is pretty neat if you rely more on css selectors; personally, I prefer it over bs4


In addition to that there is still scrapy as well, depending on what you want to achieve BS4 might not be suitable anymore.

Played around with BS4 and scrapy a bit and on first glance scrapy looked a tad more powerful to me

Anybody tried the HtmlProvider in Fsharp.Data?
I recently read a blogpost claiming f# to be the best language for scraping the interwebs.

requests + BeautifulSoup is the bomb


Python’s BeautifulSoup and requests for smaller projects and scrapy when I need a powerful library for a project

Requests and beautifulsoup make it goes boom!!

I have scraped most sites with bs4, request, sqlite/sqlalchemy (with mySQL backend).
I did it multi-threaded some times too.

I can’t say that writing the code was pleasant, but the results were really efficient!

I used to be a python-requests fan until I stumbled upon python-lxml. Just wanted to give some more attention to your answer because it is, to my knowledge, pretty uncommon.

