Skip to content

XML hotels catalogue scraping booking.com and web presentation using XSLT transform

License

Notifications You must be signed in to change notification settings

Tsvetilin/XMLHotelsCatalogue

Repository files navigation

XML Hotels Catalogue

Made as a course project for the XML Technologies course @ FMI, Sofia University

Generating an XML document by scraping booking.com, validating the document using XML Schema and vizualizing the catalogue using XSLT transform to HTML.

Structure

  • Python script for scraping the desired hotels' info and generating the XML document
  • XML Schema for validation of the document
  • Using internal DTD Entities for the images (course assignment requirement)
  • External DTD grammar (course assignment requirement effect)
  • XSLT transform document

Notes

  • The university course doesn't encourage using modern technologies but rather recommend using Internet Explorer due to the fact that it can open referenced files on the local file system
  • The required usage of internal entities is also problematic due to the lack of support of parsing native functions in some browsers
  • The JS built-in xslt processor doesn't support all required functions

Usage

  • Modify the python script to properly scrape the desired hotels
  • Start development server to properly vizualize the catalogue

Documentation

A thorough documentation (in Bulgarian) is supplied, describing the main aspects of the project.

Team

License

The project is distributed under the GNU General Public License GPLv3 or higher, see the file LICENSE for details.

About

XML hotels catalogue scraping booking.com and web presentation using XSLT transform

Topics

Resources

License

Stars

Watchers

Forks