The title is self-explanatory.
I use Python with requests and BeautifulSoup.
(Writing this post I saw that there is a tutorial on how to use this tools )
open-uri
and nokogiri
. Ruby
libcurl
and std::string::find
in C++.
mechanize
and nokogiri
. Ruby is bae. Although.
requests
in Python is pretty nice and light. Never used beautifulsoup
in python before.
As you said, pretty nice, light and easy to use
also py’s lxml.html together with cssselect is pretty neat if you rely more on css selectors; personally, I prefer it over bs4
In addition to that there is still scrapy as well, depending on what you want to achieve BS4 might not be suitable anymore.
Played around with BS4 and scrapy a bit and on first glance scrapy looked a tad more powerful to me
Anybody tried the HtmlProvider in Fsharp.Data?
I recently read a blogpost claiming f# to be the best language for scraping the interwebs.
http://biarity.me/2016/11/23/Why-F-is-the-best-langauge-for-web-scraping/
requests
+ BeautifulSoup
is the bomb
Python’s BeautifulSoup and requests for smaller projects and scrapy when I need a powerful library for a project
Requests and beautifulsoup make it goes boom!!
I have scraped most sites with bs4
, request
, sqlite
/sqlalchemy
(with mySQL backend).
I did it multi-threaded some times too.
I can’t say that writing the code was pleasant, but the results were really efficient!
I used to be a python-requests fan until I stumbled upon python-lxml. Just wanted to give some more attention to your answer because it is, to my knowledge, pretty uncommon.
This topic was automatically closed after 34 hours. New replies are no longer allowed.