Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Her 2031 - Improve login-form submission options #20

Merged
merged 56 commits into from
Mar 26, 2013
Merged

Her 2031 - Improve login-form submission options #20

merged 56 commits into from
Mar 26, 2013

Conversation

gojomo
Copy link
Member

@gojomo gojomo commented Mar 12, 2013

Basic support for detecting and submitting to login forms, primarily via the ExtractorHTMLForms and FormLoginProcessor processors. Examples of configuration in the class Javadocs.

Travis Wellman and others added 30 commits November 14, 2012 17:58
    createRecorder(String,String) - new method for creating test Recorder and specifying charset
    createRecorder(String) - deprecate (use default charset as before)
     use real use case actual content, actual urls from facebook for testing
* ExtractorMultipleRegex.java
    some comments, little twiddles
    use groovy templating facility
    getEngine() - fix logic to respect isolateThreads setting
    refactor for readability
    more refactoring to avoid redundant operations
    fix omission from last commit, uriRegex
    working on making test work - first two scroll-down urls are extracted successfully, others fail
   outLinks, outCandidates - use LinkedHashSet to ensure predictable order (any reason not to do that?)
    remove the "fooIndex" thing from available bindings, since it's kinda hacky and turned out not to be needed for our use case
    turns out that __adt parameter can be found near the json blob - most, but not all, expected links are now found
    test passes now; extractor gets what it can, which is most of the scroll down urls
    keep cache of groovy Template objects, since they are expensive to create
    remove temporary performance testing code
    avoid "constant string too long" compile error
    add "implements Closeable" since it already has the close() method
    testLaxUrlEncoding() - Tests a URL not correctly url-encoded, but that heritrix lets pass through to mimic browser behavior.
    include junit as a regular dependency not managed by eclipse, so source jar can be attached
…ng down the test http servers happen only once at start and finish respectively
    runTest() - convenience logging of test failures
    speculativeFixup() - improve detection of scheme-less intended-absolute-URIs
    refactor so considerStrings() is not static, allowing it to be overridden in subclasses
* ExtractorHTML.java, ExtractorJS.java
    add @Autowired parameter extractorJS used to process inline javascript, instead of call to static ExtractorJS.considerStrings()
    makeExtractor() - call setExtractorJS(new ExtractorJS()) so testSpeculativeLinkExtraction() passes
…ses HER-1523)

* UriUtils.java
    new method isVeryLikelyUri() with tighter heuristic than isLikelyUri()
* ExtractorJS.java
    use UriUtils.isVeryLikelyUri(), and change order of operations to do fixup before call to isVeryLikelyUri(), since it doesn't expect strings with javascript escaping and stuff
* StringExtractorTestBase.java
    handle test data with expected value null, meaning no outlinks expected
* ExtractorHTMLTest.java
    avoid redundancy by using the extractor created in ContentExtractorTestBase.setUp()
* ExtractorJSTest.java
    some new tests
    avoid redundancy by using extractor built in setUp()
    makeExtractor() - call extractor.setExtractorJS(new ExtractorJS()) since we got rid of the static ExtractorJS stuff
    testConditionalComment1() - override to skip the test since it fails with JerichoExtractorHTML
nlevitt and others added 26 commits December 13, 2012 14:11
…ary.INSTANCE.link() with FilesystemLinkMaker.makeHardLink() in doRecover() - contributed by Andrés Aguilar
The new class BeanLookupBindings allows scripts to skip getBean calls.
Lines like 'beanname = appCtx.getBean("beanname")' can be left out.
Past scripts remain compatible.
This change effects action directory scripts and rest console scripts.
bean lookup may be enabled by setting a variable named beanBindings to
true in the Bindings or in the script.
BeanLookupBindings for simpler script access to beans
@nlevitt nlevitt merged commit c5c54ac into master Mar 26, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants