Larbin is a powerful web crawler (also called [web] robot, spider...). It
is intended to fetch a large number of web pages to fill the database of a
search engine. With a network fast enough, Larbin is able to fetch more than
100 million pages on a standard PC.
Larbin was initially developed for the XYLEME project in the VERSO team at
INRIA. The goal of Larbin was to go and fetch XML pages on the web to fill
the database of an xml-oriented search engine.
The following can be done with Larbin:
o A crawler for a search engine
o A crawler for a specialized search enginer (xml, images, mp3...)
o Statistics on the web (about servers or page contents)
Larbin is created by: Sebastien Ailleret
Plack::Middleware::RemoveRedundantBody removes body in HTTP response, if it is
not required.
json-py is a simple, pure-python implementation of a JSON (http://json.org)
reader and writer. JSON is used to exchange data across systems written in
various languages. It is particularly suited to dynamic languages like Python,
Javascript, etc. JSON = Javascript Object Notation implies it is suitable for
AJAX applications that exchange data from servers to Javascript applications
running on web browser clients.
Net::Amazon provides an object-oriented interface to amazon.com's
SOAP and XML/HTTP interfaces. This way it's possible to create applications
using Amazon's vast amount of data via a functional interface, without
having to worry about the underlying communication mechanism.
LWPx::TimedHTTP performs an HTTP request exactly the same as LWP does normally
except for the fact that it times each stage of the request and then inserts the
results as header.
It's useful for debugging where abouts in a connection slow downs are occuring.
Net::TiVo provides an object-oriented interface to TiVo's REST interface. This
makes it possible to query your TiVo for information about recorded content,
such as a show's download URL, and space consumed.
ck4up is a small command-line utility, written in ruby. ck4up scans through a
configuration file, fetches the listed URLs from the web, computes the md5sum
of the page, and compares the value with the ones stored in a gdbm database.
If both differ, a message will be written to the standard output.
"httpry is a specialized packet sniffer designed for displaying and logging
HTTP traffic. It is not intended to perform analysis itself, but to capture,
parse, and log the traffic for later analysis. It can be run in real-time
displaying the traffic as it is parsed, or as a daemon process that logs to an
output file. It is written to be as lightweight and flexible as possible, so
that it can be easily adaptable to different applications."
XCAP protocol, defined in RFC 4825, allows a client to read, write, and
modify application configuration data stored in XML format on a server. XCAP
maps XML document sub-trees and element attributes to HTTP URIs, so that
these components can be directly accessed by HTTP. An XCAP server used by
XCAP clients to store data like presence policy in combination with a SIP
Presence server that supports PUBLISH/SUBSCRIBE/NOTIFY SIP methods can
provide a complete SIP SIMPLE solution.
VCR.py simplifies and speeds up tests that make HTTP requests. The first
time you run code that is inside a VCR.py context manager or decorated
function, VCR.py records all HTTP interactions that take place through
the libraries it supports and serializes and writes them to a flat file
(in yaml format by default). This flat file is called a cassette.
When the relevant peice of code is executed again, VCR.py will read the
serialized requests and responses from the aforementioned cassette file,
and intercept any HTTP requests that it recognizes from the original test
run and return the responses that corresponded to those requests. This
means that the requests will not actually result in HTTP traffic, which
confers several benefits including:
* The ability to work offline
* Completely deterministic tests
* Increased test execution speed