-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hasPattern I think is broken #152
Comments
I would say a solution will be like the following that existed in previous versions: def hasPattern(self, column, pattern, assertion=None, name=None, hint=None):
"""
Checks for pattern compliance. Given a column name and a regular expression, defines a
Check on the average compliance of the column's values to the regular expression.
:param str column: Column in DataFrame to be checked
:param Regex pattern: A name that summarizes the current check and the
metrics for the analysis being done.
:param lambda assertion: A function with an int or float parameter.
:param str name: A name for the pattern constraint.
:param str hint: A hint that states why a constraint could have failed.
:return: hasPattern self: A Check object that runs the condition on the column.
"""
assertion_func = ScalaFunction1(self._spark_session.sparkContext._gateway, assertion if assertion else lambda x: x == 1)
name = self._jvm.scala.Option.apply(name)
hint = self._jvm.scala.Option.apply(hint)
pattern_regex = self._jvm.scala.util.matching.Regex(pattern, None)
self._Check = self._Check.hasPattern(column, pattern_regex, assertion_func, name, hint)
return self |
I just merged #66 which should address this. Pending CI to pass on master and feel free to test again |
Hello, thanks for your quick response. I get this error now when I use the new implementation @chenliu0831 : ---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
Cell In[17], line 6
1 check = Check(spark, CheckLevel.Warning, "Review Check")
3 checkResult = (VerificationSuite(spark)
4 .onData(orders_reference_mock)
5 .addCheck(
----> 6 check
7 .hasPattern(column = "concept_id", pattern="[0-9a-fA-F]")
8 .isUnique("id")
9 .hasPattern(column = "id", pattern=r"[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}")
10 .hasMin("gtv", lambda x: x == 30.0)
11 .hasMax("gtv", lambda x: x == 50.0)
12 )
13 .run())
15 checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
File ~/.local/lib/python3.10/site-packages/pydeequ/checks.py:568, in Check.hasPattern(self, column, pattern, assertion, name, hint)
554 def hasPattern(self, column, pattern, assertion=None, name=None, hint=None):
555 """
556 Checks for pattern compliance. Given a column name and a regular expression, defines a
557 Check on the average compliance of the column's values to the regular expression.
(...)
565 :return: hasPattern self: A Check object that runs the condition on the column.
566 """
567 assertion_func = ScalaFunction1(self._spark_session.sparkContext._gateway, assertion) if assertion \
--> 568 else getattr(self._Check, "hasPattern$default$2")()
569 name = self._jvm.scala.Option.apply(name)
570 hint = self._jvm.scala.Option.apply(hint)
File /pyenv/versions/3.10.11/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
1315 command = proto.CALL_COMMAND_NAME +\
1316 self.command_header +\
1317 args_command +\
1318 proto.END_COMMAND_PART
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1324 for temp_arg in temp_args:
1325 temp_arg._detach()
File /pyenv/versions/3.10.11/lib/python3.10/site-packages/pyspark/sql/utils.py:190, in capture_sql_exception.<locals>.deco(*a, **kw)
188 def deco(*a: Any, **kw: Any) -> Any:
189 try:
--> 190 return f(*a, **kw)
191 except Py4JJavaError as e:
192 converted = convert_exception(e.java_exception)
File /pyenv/versions/3.10.11/lib/python3.10/site-packages/py4j/protocol.py:330, in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
--> 330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
333 else:
334 raise Py4JError(
335 "An error occurred while calling {0}{1}{2}".
336 format(target_id, ".", name))
Py4JError: An error occurred while calling o122.hasPattern$default$2. Trace:
py4j.Py4JException: Method hasPattern$default$2([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Unknown Source) FYI : |
Oh nevermind apparently assertion needs to be setted: .hasPattern(column = "concept_id",
pattern="[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}",
assertion=lambda x: x == 1/1) |
@gerileka nice! the error message seems obscure in that case.. like a red herring. I will start planning the next release this weekend |
@chenliu0831 The pull request has been merged. Do you think a new tag will be created soon to generate a new version on PyPI? |
Yes, this seems a important bug-fix. Doing release now. |
Released to PYPI - https://pypi.org/project/pydeequ/1.1.1/. Closing |
I am trying to follow this tutorial using the master version of the package.
python-deequ/tutorials/hasPattern_check.ipynb
Line 14 in aff4be6
Running the following line spits the following problem:
This comes from the fact that
hasPattern
is really empty as a function. Is this function supported anymore ?python-deequ/pydeequ/checks.py
Line 554 in aff4be6
The text was updated successfully, but these errors were encountered: