So this is my code… it’s a bit of a hack :-) but it has a few rspec specs in the
specs directory.
Run:
% rspec specs/*
to run them all… they should all pass.
To get actual useful work done, run:
% ruby parse_hn_front_page.rb > results.txt
This shows a dump of what it finds, and what the objects contain.
Obviously, a lot could be done. I think it would be useful to set up classes
that recognize popular sites (nytimes.com, wsj.com, etc.) that don’t easily give
up their useful text.