Skip to content

Parsing HTML-files from Aktienführer CDs into structured JSON data

License

Notifications You must be signed in to change notification settings

UB-Mannheim/akf-cdparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

akf-cdparser

Functionality in a nutshell

akf-cdparser serves the purpose to translate the html-files (Generated by 'Hoppenstedt_decoder') from the Hoppenstedt-Aktienführer CD's (1999-2015-further Versions). It is part of the Aktienführer-Datenarchiv work process.

The files are parsed to categorized json-files, which can be then read by another tool, called akf-dbTools. This tool does then the final step, it loads the json to the sql-database. The following diagram shows the basic workflow of the complete CD-parsing-project and also the position of akf-cdparser within that project.

workflow

Installation

Install Node.js

In command line, go to project folder and type:

npm install

The project was written in Visual Studio Code, so if you are a developer it's recomended to use it.

Commands to use akf-cdparser

Usage: akf-html2json [options]

Options:

-h, --help                output usage information
-o, --outputdir <dir>     Output directory
-i, --inputdir <dir>      Input directory with html files
-p, --print               Print to STDOUT
-n, --parallel <num>      Parallel conversions
-f, --inputfiles <files>  Input files
-l, --lse                 do line segment analysis in the end

The standard for using the software is:

bin/node akf-html2json --inputfiles path\\to\\html\\file.html
bin/node akf-html2json --inputdir   path\\to\\html\\directory

If you use visual-studio-code you can use the launch.json in .vscode-folder to run from vscode with specified parameters. It provides some basic examples which can be commented in.

Program overview

The program consists of many modules. The following diagram shows the different modules, their main purpose and how they are basically connected.
parserdiagram

About

Parsing HTML-files from Aktienführer CDs into structured JSON data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •