What tools you use for web scraping?

The title is self-explanatory.
I use Python with requests and BeautifulSoup.
(Writing this post I saw that there is a tutorial on how to use this tools :slight_smile:)

3 Likes

open-uri and nokogiri. Ruby

libcurl and std::string::find in C++.

2 Likes

mechanize and nokogiri. Ruby is bae. Although.

requests in Python is pretty nice and light. Never used beautifulsoup in python before.

3 Likes

As you said, pretty nice, light and easy to use

2 Likes

also py’s lxml.html together with cssselect is pretty neat if you rely more on css selectors; personally, I prefer it over bs4

3 Likes

In addition to that there is still scrapy as well, depending on what you want to achieve BS4 might not be suitable anymore.

Played around with BS4 and scrapy a bit and on first glance scrapy looked a tad more powerful to me

2 Likes

Anybody tried the HtmlProvider in Fsharp.Data?
I recently read a blogpost claiming f# to be the best language for scraping the interwebs.
http://biarity.me/2016/11/23/Why-F-is-the-best-langauge-for-web-scraping/

3 Likes

requests + BeautifulSoup is the bomb

1 Like

Python’s BeautifulSoup and requests for smaller projects and scrapy when I need a powerful library for a project

2 Likes

Requests and beautifulsoup make it goes boom!!

2 Likes

I have scraped most sites with bs4, request, sqlite/sqlalchemy (with mySQL backend).
I did it multi-threaded some times too.

I can’t say that writing the code was pleasant, but the results were really efficient!

1 Like

I used to be a python-requests fan until I stumbled upon python-lxml. Just wanted to give some more attention to your answer because it is, to my knowledge, pretty uncommon.

1 Like

This topic was automatically closed after 34 hours. New replies are no longer allowed.