Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No warnings/exceptions when data truncated #2

Open
wizofaus opened this issue Jul 22, 2016 · 18 comments
Open

No warnings/exceptions when data truncated #2

wizofaus opened this issue Jul 22, 2016 · 18 comments

Comments

@wizofaus
Copy link

It would be good (especially for dollar amounts!) to have some sort of warning/error mechanism for values that are too long/too big. This includes when generating the FileTotalRecord.

@stuarta0
Copy link
Owner

For the Westpac QuickSuper format there is a seperate Validator class that encompasses all the required rules. Implementing the same behaviour for ABA would mean you could do something like this:

var errors = new ABA.Validator().Validate(myFile);

The reason I'd lean towards this over exceptions is that you can continue using/manipulating the data within the classes until crunch time (writing the file) at which point you could run the validator and decide what to do gracefully. Potentially I could add the validator check to the ABAFileIO class and if you try to write the file while there are errors (at least for amount overflow) it'll throw an exception in that case.

Thoughts?

@wizofaus
Copy link
Author

I’d like to see an exception when trying to write out the file. It’d also be good to have a function that would let me know whether another DetailRecord could be added without exceeding the max possible total.

@wizofaus
Copy link
Author

It’d also be nice if the library had a way of checking that e.g. a lodgement description only contained valid characters (http://help.westpac.com.au/help/content/col/documents/pdfs/olpimportaude.pdf seems to list them). Interestingly, it seems that the best character encoding is 1252 (SBCSCodePageEncoding) – the Unicode cents character (U+00A2) and ‘not’ character (U+00AC) are listed as supported characters, but all the ABA examples I’ve seen have been ASCII (despite no spec explicitly stating this), so I assume this should be 0xA2/0xAC. I can only assume any non-printable characters (< ASCII 32, e.g. newline) should be stripped out or converted to spaces etc. I noticed in the existing stream serialization code the codepage is not specified, which may mean if the system default has been changed you’ll get incorrect behavior for characters over ASCII 127.

stuarta0 added a commit that referenced this issue Jul 22, 2016
Added some initial tests to address issues raised in #2
@stuarta0
Copy link
Owner

Currently making changes in iss2-validation branch. In regard to the character encoding it's interesting because the linked document says to use EBCDIC and then proceeds to list ALL valid characters including characters that can't be represented using EBCDIC (according to the Wikipedia page anyway). I haven't looked into 1252.

Likewise I've never seen a problem using ASCII. For now I've added in validation using <127 ASCII (i.e. excluding cents char and 'not' char) containing the listed valid characters. While not complete, it will be valid as it's a subset of the characters given.

@wizofaus
Copy link
Author

https://www.nab.com.au/content/dam/nabconnectcontent/file-formats/NAB%20Connect%20Consolidated%20File%20Format%20Specification_V0.05.pdf has different valid characters again, but at least says to use ASCII character encoding. I think 1252 is the best codepage to use for ASCII, and it’s the default on just about all systems configured in English speaking countries. If you explicitly use the .NET ASCIIEncoding you lose the cents and not characters (anything over unicode 127 I guess). But perhaps allow the user to override it… (actually I’m not using your library to write the bytes to disk, as I’m actually uploading them to an SFTP server so I do my own encoding).

@stuarta0
Copy link
Owner

Yes that was the idea behind separating the concerns of IO to another class; so consumers could use it for convenience or roll their own.

In regard to overflowing the amount or count of records, I'm wondering whether I should expand the class structure to understand the reel number concept and handle it automatically, or whether it's up to the consumer to do so. Since it sounds like you might be encountering this limit, would you prefer to just handle it manually (creating new AbaFile objects as you go), or be able to add as many detail records as you like and have the AbaFile class automatically generate the required data structure for writing? I haven't thought this through yet so I can't say exactly how it would do it.

@wizofaus
Copy link
Author

Well at this stage I’m that not likely to hit the 99 million dollar limit, but I would like the certainty that if it does happen I don’t just end up sending off a file with the amounts incorrectly truncated. Like I said before, a function to query “can I safely add this record” would be sufficient. But sure, the ability to have the library automatically split it up with multiple reel numbers etc. would be great for those that definitely need that functionality. It does sadden me that in this day and age we’re still using file formats like this – it’s incredible that there’s no unique ID (or even a timestamp) for each record, so I really don’t know how banks prevent transactions getting duplicated when there’s been some sort of network partitioning issue (e.g. the file’s been uploaded but the client dies before it’s able to update its internal state, hence thinks it needs to try again).

@stuarta0
Copy link
Owner

Yep it's pretty archaic. It's also suprisingly hard to find the original document explaining the spec. This might be it: http://www.apca.com.au/docs/default-source/payment-systems/becs_procedures.pdf

Page 70 (12.3) refers to EBCDIC. Page 86 (C7.1) contains the character list. It would also appear the format is 22 years old. Hooray!

@stuarta0
Copy link
Owner

stuarta0 commented Aug 4, 2016

OK I've added examples in the readme for 824c2b8 that shows briefly how the code in the iss2-validation branch works.

The AbaFileValidator class contains a CanAdd() method that allows you to check whether the addition of one or more detail records will invalidate the file. There are also Clean() methods on all the validators to attempt to clean the data, including stripping characters outside the required character sets, summing total records and truncating strings. If any of the corrections can't be applied automatically (e.g. keeping a currency field from truncating) an exception will be thrown from Clean(). Alternatively, calling Validate() will return a collection of all exceptions explaining the invalid data.

If you don't like how the builtin validators work, you can mix and match the individual classes. For example, you could provide your own character set validators that have a given character set and throw exceptions if they exceed field lengths. Or you could just use the currency validators and ignore the rest.

There's still a couple of test cases to add and a validator for the DescriptiveRecord before I roll it into master.

@wizofaus
Copy link
Author

wizofaus commented Aug 4, 2016

Sounds good, although I think “Clean” is a bit of an odd function name for a class called ‘validator’. How does it handle negative numbers? Did you also fix the code that calculates the net amount, and to allow the indicator field to be empty?

@wizofaus
Copy link
Author

wizofaus commented Aug 4, 2016

foreach (var e in base.Validate(total))
return false;
return true;

Rather odd bit of code…why not just

return !base.Validate(total).Any();

?

@stuarta0
Copy link
Owner

stuarta0 commented Aug 4, 2016

Any() is a Linq expression and the library is vanilla .NET 2.0. I'm not going to bother pulling in LINQ Bridge to get that one line of code working another way. In any case, it's an IEnumerable returned via yield, so while a tad quirky I don't see there being any performance issue with it.

Personal taste as to whether clean should be contained within something named 'validator'. I deem the two processes intrinsically linked as the checking of and correcting of bad data are closely aligned. The only thing I'd change is separating an interface for ICleaner to remove Clean from the IValidator contract.

@wizofaus
Copy link
Author

wizofaus commented Aug 4, 2016

Well .Any() is just .GetEnumerator().MoveNext()…

@stuarta0
Copy link
Owner

stuarta0 commented Aug 4, 2016

You're welcome to update it and submit a PR :)

@wizofaus
Copy link
Author

wizofaus commented Aug 4, 2016

Because of how I’m using that source I’ve had to add an attribution to the top of each source file – I’m not sure if you want that…

@stuarta0
Copy link
Owner

stuarta0 commented Aug 5, 2016

Probably not. You'd clone, make a branch for the PR, make the changes and submit the PR. After that it's up to you. If it were me, I'd have a branch in my other repo that has any customisations to the base library for my purposes. When the base library changes (e.g. a PR is merged), I'd pull the resulting changes and merge with my custom branch.

@stuarta0
Copy link
Owner

stuarta0 commented Aug 5, 2016

I will admit I didn't think a LOC like that would be an issue. Creating a dependency on FileHelpers when you're not using it, or the architecture decisions, or the unit test coverage, or performance issues would be a bigger concern IMO. Especially when the code loops over the detail records three times when validating - once each for debit, credit and net totals for isolation. Now if thats a concern (and I'd suggest it's a lot more important than the issue raised), you can quite easily write another validator that combines all this logic into one loop (and write your test cases for it). The building blocks are all there (and tested) for you to do whatever you need.

@wizofaus
Copy link
Author

wizofaus commented Aug 9, 2016

BTW I just found that
CurrencyConverter.FieldToString is barfing on 30 million (presumably on anything over ~21.47 million), even though DE files can hold values up to (but not including) 100 million. Changed from int to long.

From: Dylan Nicholson [mailto:dylan.nicholson@bluechain.com]
Sent: Friday, 5 August 2016 9:45 AM
To: stuarta0/banking-au reply@reply.github.com; stuarta0/banking-au banking-au@noreply.github.com
Cc: wizofaus d.nicholson@1paycorp.com; Author author@noreply.github.com
Subject: RE: [stuarta0/banking-au] No warnings/exceptions when data truncated (#2)

Because of how I’m using that source I’ve had to add an attribution to the top of each source file – I’m not sure if you want that…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants