What tools you use for web scraping?

scraping
tools
web

(CGMS) #1

The title is self-explanatory.
I use Python with requests and BeautifulSoup.
(Writing this post I saw that there is a tutorial on how to use this tools :slight_smile:)


(oaktree) #2

open-uri and nokogiri. Ruby

libcurl and std::string::find in C++.


(Command-Line Ninja) #3

mechanize and nokogiri. Ruby is bae. Although.

requests in Python is pretty nice and light. Never used beautifulsoup in python before.


#4

As you said, pretty nice, light and easy to use


(ouroborus) #5

also py’s lxml.html together with cssselect is pretty neat if you rely more on css selectors; personally, I prefer it over bs4


#6

In addition to that there is still scrapy as well, depending on what you want to achieve BS4 might not be suitable anymore.

Played around with BS4 and scrapy a bit and on first glance scrapy looked a tad more powerful to me


(Joshua Jensch) #7

Anybody tried the HtmlProvider in Fsharp.Data?
I recently read a blogpost claiming f# to be the best language for scraping the interwebs.
http://biarity.me/2016/11/23/Why-F-is-the-best-langauge-for-web-scraping/


(Full Snack Developer) #8

requests + BeautifulSoup is the bomb


#9

Python’s BeautifulSoup and requests for smaller projects and scrapy when I need a powerful library for a project


(Temiloluwa) #10

Requests and beautifulsoup make it goes boom!!


(John) #11

I have scraped most sites with bs4, request, sqlite/sqlalchemy (with mySQL backend).
I did it multi-threaded some times too.

I can’t say that writing the code was pleasant, but the results were really efficient!


(Command-Line Ninja) #13

#14

I used to be a python-requests fan until I stumbled upon python-lxml. Just wanted to give some more attention to your answer because it is, to my knowledge, pretty uncommon.


(Command-Line Ninja) #15

This topic was automatically closed after 34 hours. New replies are no longer allowed.