Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support matrix updates for 22.06 #500

Merged
merged 1 commit into from
Aug 5, 2022
Merged

Support matrix updates for 22.06 #500

merged 1 commit into from
Aug 5, 2022

Conversation

nvidia-merlin-bot
Copy link
Contributor

Updates from containers

@mikemckiernan mikemckiernan self-assigned this Aug 5, 2022
@mikemckiernan mikemckiernan added the documentation Improvements or additions to documentation label Aug 5, 2022
@github-actions
Copy link

github-actions bot commented Aug 5, 2022

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-500

@nvidia-merlin-bot
Copy link
Contributor Author

Click to view CI Results
GitHub pull request #500 of commit 894bdd7e274b7a2aed78879d21a7024a0b9aaed4, no merge conflicts.
Running as SYSTEM
Setting status of 894bdd7e274b7a2aed78879d21a7024a0b9aaed4 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/311/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/500/*:refs/remotes/origin/pr/500/* # timeout=10
 > git rev-parse 894bdd7e274b7a2aed78879d21a7024a0b9aaed4^{commit} # timeout=10
Checking out Revision 894bdd7e274b7a2aed78879d21a7024a0b9aaed4 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 894bdd7e274b7a2aed78879d21a7024a0b9aaed4 # timeout=10
Commit message: "Updates from containers"
 > git rev-list --no-walk b4778617b5773025938374d114146ced798387a5 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins5833342634996700361.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py . [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py . [100%]

======================== 3 passed in 238.99s (0:03:58) =========================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://github.com/gitapi/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins17142398986017548565.sh

@nvidia-merlin-bot
Copy link
Contributor Author

Click to view CI Results
GitHub pull request #500 of commit 371f0d16520f98f994078662ba619f081a2d6fe7, no merge conflicts.
Running as SYSTEM
Setting status of 371f0d16520f98f994078662ba619f081a2d6fe7 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/313/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/500/*:refs/remotes/origin/pr/500/* # timeout=10
 > git rev-parse 371f0d16520f98f994078662ba619f081a2d6fe7^{commit} # timeout=10
Checking out Revision 371f0d16520f98f994078662ba619f081a2d6fe7 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 371f0d16520f98f994078662ba619f081a2d6fe7 # timeout=10
Commit message: "Updates from containers"
 > git rev-list --no-walk 290fecb3cee72548a2551bc00c0d5e75a32c2d4d # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins3074948821570825790.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py F [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py . [100%]

=================================== FAILURES ===================================
__________________________________ test_func ___________________________________

self = <testbook.client.TestbookNotebookClient object at 0x7f93a33b8f10>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
          cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)

/usr/local/lib/python3.8/dist-packages/testbook/client.py:133:


args = (<testbook.client.TestbookNotebookClient object at 0x7f93a33b8f10>, {'id': 'b06ccb2e', 'cell_type': 'code', 'metadata'...ast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}, 53)
kwargs = {}

def wrapped(*args, **kwargs):
  return just_run(coro(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:85:


coro = <coroutine object NotebookClient.async_execute_cell at 0x7f93a390c4c0>

def just_run(coro: Awaitable) -> Any:
    """Make the coroutine run, even if there is an event loop running (using nest_asyncio)"""
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None
    if loop is None:
        had_running_loop = False
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
    else:
        had_running_loop = True
    if had_running_loop:
        # if there is a running loop, we patch using nest_asyncio
        # to have reentrant event loops
        check_ipython()
        import nest_asyncio

        nest_asyncio.apply()
        check_patch_tornado()
  return loop.run_until_complete(coro)

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:60:


self = <_UnixSelectorEventLoop running=False closed=False debug=False>
future = <Task finished name='Task-369' coro=<NotebookClient.async_execute_cell() done, defined at /usr/local/lib/python3.8/dis...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n\n')>

def run_until_complete(self, future):
    """Run until the Future is done.

    If the argument is a coroutine, it is wrapped in a Task.

    WARNING: It would be disastrous to call run_until_complete()
    with the same coroutine twice -- it would wrap it in two
    different Tasks and that can't be good.

    Return the Future's result, or raise its exception.
    """
    self._check_closed()
    self._check_running()

    new_task = not futures.isfuture(future)
    future = tasks.ensure_future(future, loop=self)
    if new_task:
        # An exception is raised if the future didn't complete, so there
        # is no need to log the "destroy pending task" message
        future._log_destroy_pending = False

    future.add_done_callback(_run_until_complete_cb)
    try:
        self.run_forever()
    except:
        if new_task and future.done() and not future.cancelled():
            # The coroutine raised a BaseException. Consume the exception
            # to not log a warning, the caller doesn't have access to the
            # local task.
            future.exception()
        raise
    finally:
        future.remove_done_callback(_run_until_complete_cb)
    if not future.done():
        raise RuntimeError('Event loop stopped before Future completed.')
  return future.result()

/usr/lib/python3.8/asyncio/base_events.py:616:


self = <testbook.client.TestbookNotebookClient object at 0x7f93a33b8f10>
cell = {'id': 'b06ccb2e', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-05T01:03:05.611725Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53, execution_count = None, store_history = True

async def async_execute_cell(
    self,
    cell: NotebookNode,
    cell_index: int,
    execution_count: t.Optional[int] = None,
    store_history: bool = True,
) -> NotebookNode:
    """
    Executes a single code cell.

    To execute all cells see :meth:`execute`.

    Parameters
    ----------
    cell : nbformat.NotebookNode
        The cell which is currently being processed.
    cell_index : int
        The position of the cell within the notebook object.
    execution_count : int
        The execution count to be assigned to the cell (default: Use kernel response)
    store_history : bool
        Determines if history should be stored in the kernel (default: False).
        Specific to ipython kernels, which can store command histories.

    Returns
    -------
    output : dict
        The execution output payload (or None for no output).

    Raises
    ------
    CellExecutionError
        If execution failed and should raise an exception, this will be raised
        with defaults about the failure.

    Returns
    -------
    cell : NotebookNode
        The cell which was just processed.
    """
    assert self.kc is not None

    await run_hook(self.on_cell_start, cell=cell, cell_index=cell_index)

    if cell.cell_type != 'code' or not cell.source.strip():
        self.log.debug("Skipping non-executing cell %s", cell_index)
        return cell

    if self.skip_cells_with_tag in cell.metadata.get("tags", []):
        self.log.debug("Skipping tagged cell %s", cell_index)
        return cell

    if self.record_timing:  # clear execution metadata prior to execution
        cell['metadata']['execution'] = {}

    self.log.debug("Executing cell:\n%s", cell.source)

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors or "raises-exception" in cell.metadata.get("tags", [])
    )

    await run_hook(self.on_cell_execute, cell=cell, cell_index=cell_index)
    parent_msg_id = await ensure_async(
        self.kc.execute(
            cell.source, store_history=store_history, stop_on_error=not cell_allows_errors
        )
    )
    await run_hook(self.on_cell_complete, cell=cell, cell_index=cell_index)
    # We launched a code cell to execute
    self.code_cells_executed += 1
    exec_timeout = self._get_timeout(cell)

    cell.outputs = []
    self.clear_before_next_output = False

    task_poll_kernel_alive = asyncio.ensure_future(self._async_poll_kernel_alive())
    task_poll_output_msg = asyncio.ensure_future(
        self._async_poll_output_msg(parent_msg_id, cell, cell_index)
    )
    self.task_poll_for_reply = asyncio.ensure_future(
        self._async_poll_for_reply(
            parent_msg_id, cell, exec_timeout, task_poll_output_msg, task_poll_kernel_alive
        )
    )
    try:
        exec_reply = await self.task_poll_for_reply
    except asyncio.CancelledError:
        # can only be cancelled by task_poll_kernel_alive when the kernel is dead
        task_poll_output_msg.cancel()
        raise DeadKernelError("Kernel died")
    except Exception as e:
        # Best effort to cancel request if it hasn't been resolved
        try:
            # Check if the task_poll_output is doing the raising for us
            if not isinstance(e, CellControlSignal):
                task_poll_output_msg.cancel()
        finally:
            raise

    if execution_count:
        cell['execution_count'] = execution_count
    await run_hook(
        self.on_cell_executed, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
  await self._check_raise_for_error(cell, cell_index, exec_reply)

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:1022:


self = <testbook.client.TestbookNotebookClient object at 0x7f93a33b8f10>
cell = {'id': 'b06ccb2e', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-05T01:03:05.611725Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53
exec_reply = {'buffers': [], 'content': {'ename': 'InferenceServerException', 'engine_info': {'engine_id': -1, 'engine_uuid': '2f11...e, 'engine': '2f113c00-e31c-4529-8c8e-49681ed84cdf', 'started': '2022-08-05T01:03:05.612039Z', 'status': 'error'}, ...}

async def _check_raise_for_error(
    self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
) -> None:

    if exec_reply is None:
        return None

    exec_reply_content = exec_reply['content']
    if exec_reply_content['status'] != 'error':
        return None

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors
        or exec_reply_content.get('ename') in self.allow_error_names
        or "raises-exception" in cell.metadata.get("tags", [])
    )
    await run_hook(
        self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
    if not cell_allows_errors:
      raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)

E nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:916: CellExecutionError

During handling of the above exception, another exception occurred:

def test_func():
    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "01-Building-Recommender-Systems-with-Merlin.ipynb",
        execute=False,
    ) as tb1:
        tb1.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["NUM_ROWS"] = "10000"
            os.system("mkdir -p /tmp/examples")
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        tb1.execute()
        assert os.path.isdir("/tmp/examples/dlrm")
        assert os.path.isdir("/tmp/examples/feature_repo")
        assert os.path.isdir("/tmp/examples/query_tower")
        assert os.path.isfile("/tmp/examples/item_embeddings.parquet")
        assert os.path.isfile("/tmp/examples/feature_repo/user_features.py")
        assert os.path.isfile("/tmp/examples/feature_repo/item_features.py")

    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb",
        execute=False,
    ) as tb2:
        tb2.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        NUM_OF_CELLS = len(tb2.cells)
        tb2.execute_cell(list(range(0, NUM_OF_CELLS - 3)))
        top_k = tb2.ref("top_k")
        outputs = tb2.ref("outputs")
        assert outputs[0] == "ordered_ids"
      tb2.inject(
            """
            import shutil
            from merlin.models.loader.tf_utils import configure_tensorflow
            configure_tensorflow()
            from merlin.systems.triton.utils import run_ensemble_on_tritonserver
            response = run_ensemble_on_tritonserver(
                "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
            )
            response = [x.tolist()[0] for x in response["ordered_ids"]]
            shutil.rmtree("/tmp/examples/", ignore_errors=True)
            """
        )

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py:57:


/usr/local/lib/python3.8/dist-packages/testbook/client.py:237: in inject
cell = TestbookNode(self.execute_cell(inject_idx)) if run else TestbookNode(code_cell)


self = <testbook.client.TestbookNotebookClient object at 0x7f93a33b8f10>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
            cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)
        except CellExecutionError as ce:
          raise TestbookRuntimeError(ce.evalue, ce, self._get_error_class(ce.ename))

E testbook.exceptions.TestbookRuntimeError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/testbook/client.py:135: TestbookRuntimeError
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
2022-08-05 01:01:25.812569: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-05 01:01:27.786541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-05 01:01:27.787243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.8/logging/init.py", line 2127, in shutdown
h.close()
File "/usr/local/lib/python3.8/dist-packages/absl/logging/init.py", line 934, in close
self.stream.close()
File "/usr/local/lib/python3.8/dist-packages/ipykernel/iostream.py", line 438, in close
self.watch_fd_thread.join()
AttributeError: 'OutStream' object has no attribute 'watch_fd_thread'
WARNING clustering 250 points to 32 centroids: please provide at least 1248 training points
2022-08-05 01:02:58.735174: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-05 01:03:00.711439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-05 01:03:00.712157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
I0805 01:03:05.875937 13473 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fbf50000000' with size 268435456
I0805 01:03:05.876727 13473 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0805 01:03:05.884054 13473 model_repository_manager.cc:1191] loading: 0_queryfeast:1
I0805 01:03:05.984394 13473 model_repository_manager.cc:1191] loading: 1_predicttensorflow:1
I0805 01:03:05.991446 13473 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 0)
I0805 01:03:06.084780 13473 model_repository_manager.cc:1191] loading: 2_queryfaiss:1
I0805 01:03:06.184981 13473 model_repository_manager.cc:1191] loading: 3_queryfeast:1
I0805 01:03:06.285230 13473 model_repository_manager.cc:1191] loading: 4_unrollfeatures:1
I0805 01:03:06.385469 13473 model_repository_manager.cc:1191] loading: 5_predicttensorflow:1
I0805 01:03:06.485704 13473 model_repository_manager.cc:1191] loading: 6_softmaxsampling:1
I0805 01:03:08.320495 13473 model_repository_manager.cc:1345] successfully loaded '0_queryfeast' version 1
I0805 01:03:08.599587 13473 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0805 01:03:08.599627 13473 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0805 01:03:08.599634 13473 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0805 01:03:08.599640 13473 tensorflow.cc:2221] backend configuration:
{"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}}
I0805 01:03:08.599676 13473 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 1_predicttensorflow (version 1)
I0805 01:03:08.601294 13473 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 5_predicttensorflow (version 1)
I0805 01:03:08.605148 13473 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 1_predicttensorflow (GPU device 0)
2022-08-05 01:03:08.953837: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-05 01:03:08.957920: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-05 01:03:08.957945: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-05 01:03:08.958037: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-05 01:03:08.995029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12648 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-05 01:03:09.041643: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-05 01:03:09.121244: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-05 01:03:09.145522: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 191704 microseconds.
I0805 01:03:09.145639 13473 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)
I0805 01:03:09.145740 13473 model_repository_manager.cc:1345] successfully loaded '1_predicttensorflow' version 1
I0805 01:03:11.496242 13473 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 3_queryfeast (GPU device 0)
I0805 01:03:11.500232 13473 model_repository_manager.cc:1345] successfully loaded '2_queryfaiss' version 1
I0805 01:03:13.821030 13473 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 5_predicttensorflow (GPU device 0)
I0805 01:03:13.821291 13473 model_repository_manager.cc:1345] successfully loaded '3_queryfeast' version 1
2022-08-05 01:03:13.821911: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-05 01:03:13.847211: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-05 01:03:13.847285: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-05 01:03:13.849746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12648 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-05 01:03:13.872193: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-05 01:03:14.027693: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-05 01:03:14.078009: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 256113 microseconds.
I0805 01:03:14.078172 13473 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 4_unrollfeatures (GPU device 0)
I0805 01:03:14.078287 13473 model_repository_manager.cc:1345] successfully loaded '5_predicttensorflow' version 1
I0805 01:03:16.140433 13473 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 6_softmaxsampling (GPU device 0)
I0805 01:03:16.140716 13473 model_repository_manager.cc:1345] successfully loaded '4_unrollfeatures' version 1
I0805 01:03:18.176042 13473 model_repository_manager.cc:1345] successfully loaded '6_softmaxsampling' version 1
I0805 01:03:18.178870 13473 model_repository_manager.cc:1191] loading: ensemble_model:1
I0805 01:03:18.279694 13473 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1
I0805 01:03:18.279875 13473 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0805 01:03:18.279988 13473 server.cc:583]
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}} |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0805 01:03:18.280117 13473 server.cc:626]
+---------------------+---------+--------+
| Model | Version | Status |
+---------------------+---------+--------+
| 0_queryfeast | 1 | READY |
| 1_predicttensorflow | 1 | READY |
| 2_queryfaiss | 1 | READY |
| 3_queryfeast | 1 | READY |
| 4_unrollfeatures | 1 | READY |
| 5_predicttensorflow | 1 | READY |
| 6_softmaxsampling | 1 | READY |
| ensemble_model | 1 | READY |
+---------------------+---------+--------+

I0805 01:03:18.342700 13473 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0805 01:03:18.343580 13473 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/examples/poc_ensemble |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0805 01:03:18.344757 13473 grpc_server.cc:4589] Started GRPCInferenceService at 0.0.0.0:8001
I0805 01:03:18.345294 13473 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0805 01:03:18.386495 13473 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W0805 01:03:19.372036 13473 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0805 01:03:19.372110 13473 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0805 01:03:20.372268 13473 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0805 01:03:20.372320 13473 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0805 01:03:21.389741 13473 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0805 01:03:21.389795 13473 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
0805 01:03:22.140926 13730 pb_stub.cc:749] Failed to process the request(s) for model '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)

Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"

At:
/tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

I0805 01:03:22.145175 13473 server.cc:257] Waiting for in-flight requests to complete.
I0805 01:03:22.145222 13473 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0805 01:03:22.145241 13473 model_repository_manager.cc:1223] unloading: ensemble_model:1
I0805 01:03:22.145337 13473 model_repository_manager.cc:1223] unloading: 6_softmaxsampling:1
I0805 01:03:22.145404 13473 model_repository_manager.cc:1223] unloading: 5_predicttensorflow:1
I0805 01:03:22.145425 13473 model_repository_manager.cc:1328] successfully unloaded 'ensemble_model' version 1
I0805 01:03:22.145513 13473 model_repository_manager.cc:1223] unloading: 4_unrollfeatures:1
I0805 01:03:22.145561 13473 model_repository_manager.cc:1223] unloading: 3_queryfeast:1
I0805 01:03:22.145617 13473 model_repository_manager.cc:1223] unloading: 2_queryfaiss:1
I0805 01:03:22.145645 13473 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0805 01:03:22.145676 13473 model_repository_manager.cc:1223] unloading: 1_predicttensorflow:1
I0805 01:03:22.145760 13473 model_repository_manager.cc:1223] unloading: 0_queryfeast:1
I0805 01:03:22.145803 13473 server.cc:288] All models are stopped, unloading models
I0805 01:03:22.145822 13473 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0805 01:03:22.145826 13473 server.cc:295] Timeout 30: Found 7 live models and 0 in-flight non-inference requests
I0805 01:03:22.145862 13473 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0805 01:03:22.145946 13473 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0805 01:03:22.153335 13473 model_repository_manager.cc:1328] successfully unloaded '1_predicttensorflow' version 1
I0805 01:03:22.166202 13473 model_repository_manager.cc:1328] successfully unloaded '5_predicttensorflow' version 1
I0805 01:03:23.145963 13473 server.cc:295] Timeout 29: Found 5 live models and 0 in-flight non-inference requests
I0805 01:03:23.445343 13473 model_repository_manager.cc:1328] successfully unloaded '4_unrollfeatures' version 1
I0805 01:03:23.728285 13473 model_repository_manager.cc:1328] successfully unloaded '2_queryfaiss' version 1
I0805 01:03:23.752261 13473 model_repository_manager.cc:1328] successfully unloaded '6_softmaxsampling' version 1
I0805 01:03:24.146143 13473 server.cc:295] Timeout 28: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:25.146274 13473 server.cc:295] Timeout 27: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:26.146411 13473 server.cc:295] Timeout 26: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:27.146548 13473 server.cc:295] Timeout 25: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:28.146684 13473 server.cc:295] Timeout 24: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:29.146822 13473 server.cc:295] Timeout 23: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:30.146947 13473 server.cc:295] Timeout 22: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:31.147083 13473 server.cc:295] Timeout 21: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:32.147217 13473 server.cc:295] Timeout 20: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:33.147355 13473 server.cc:295] Timeout 19: Found 2 live models and 0 in-flight non-inference requests
I0805 01:03:34.147490 13473 server.cc:295] Timeout 18: Found 2 live models and 0 in-flight non-inference requests
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0805 01:03:34.826026 13473 model_repository_manager.cc:1328] successfully unloaded '0_queryfeast' version 1
I0805 01:03:35.147620 13473 server.cc:295] Timeout 17: Found 1 live models and 0 in-flight non-inference requests
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0805 01:03:35.565480 13473 model_repository_manager.cc:1328] successfully unloaded '3_queryfeast' version 1
I0805 01:03:36.147745 13473 server.cc:295] Timeout 16: Found 0 live models and 0 in-flight non-inference requests
=========================== short test summary info ============================
FAILED tests/unit/examples/test_building_deploying_multi_stage_RecSys.py::test_func
=================== 1 failed, 2 passed in 238.57s (0:03:58) ====================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://github.com/gitapi/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins14902039937927520141.sh

@mikemckiernan
Copy link
Member

rerun tests

@nvidia-merlin-bot
Copy link
Contributor Author

Click to view CI Results
GitHub pull request #500 of commit 371f0d16520f98f994078662ba619f081a2d6fe7, no merge conflicts.
Running as SYSTEM
Setting status of 371f0d16520f98f994078662ba619f081a2d6fe7 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/314/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/500/*:refs/remotes/origin/pr/500/* # timeout=10
 > git rev-parse 371f0d16520f98f994078662ba619f081a2d6fe7^{commit} # timeout=10
Checking out Revision 371f0d16520f98f994078662ba619f081a2d6fe7 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 371f0d16520f98f994078662ba619f081a2d6fe7 # timeout=10
Commit message: "Updates from containers"
 > git rev-list --no-walk 371f0d16520f98f994078662ba619f081a2d6fe7 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins17055726738672546747.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py F [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py . [100%]

=================================== FAILURES ===================================
__________________________________ test_func ___________________________________

self = <testbook.client.TestbookNotebookClient object at 0x7fc1b86c2bb0>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
          cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)

/usr/local/lib/python3.8/dist-packages/testbook/client.py:133:


args = (<testbook.client.TestbookNotebookClient object at 0x7fc1b86c2bb0>, {'id': 'c5a56806', 'cell_type': 'code', 'metadata'...ast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}, 53)
kwargs = {}

def wrapped(*args, **kwargs):
  return just_run(coro(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:85:


coro = <coroutine object NotebookClient.async_execute_cell at 0x7fc1b85b43c0>

def just_run(coro: Awaitable) -> Any:
    """Make the coroutine run, even if there is an event loop running (using nest_asyncio)"""
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None
    if loop is None:
        had_running_loop = False
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
    else:
        had_running_loop = True
    if had_running_loop:
        # if there is a running loop, we patch using nest_asyncio
        # to have reentrant event loops
        check_ipython()
        import nest_asyncio

        nest_asyncio.apply()
        check_patch_tornado()
  return loop.run_until_complete(coro)

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:60:


self = <_UnixSelectorEventLoop running=False closed=False debug=False>
future = <Task finished name='Task-369' coro=<NotebookClient.async_execute_cell() done, defined at /usr/local/lib/python3.8/dis...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n\n')>

def run_until_complete(self, future):
    """Run until the Future is done.

    If the argument is a coroutine, it is wrapped in a Task.

    WARNING: It would be disastrous to call run_until_complete()
    with the same coroutine twice -- it would wrap it in two
    different Tasks and that can't be good.

    Return the Future's result, or raise its exception.
    """
    self._check_closed()
    self._check_running()

    new_task = not futures.isfuture(future)
    future = tasks.ensure_future(future, loop=self)
    if new_task:
        # An exception is raised if the future didn't complete, so there
        # is no need to log the "destroy pending task" message
        future._log_destroy_pending = False

    future.add_done_callback(_run_until_complete_cb)
    try:
        self.run_forever()
    except:
        if new_task and future.done() and not future.cancelled():
            # The coroutine raised a BaseException. Consume the exception
            # to not log a warning, the caller doesn't have access to the
            # local task.
            future.exception()
        raise
    finally:
        future.remove_done_callback(_run_until_complete_cb)
    if not future.done():
        raise RuntimeError('Event loop stopped before Future completed.')
  return future.result()

/usr/lib/python3.8/asyncio/base_events.py:616:


self = <testbook.client.TestbookNotebookClient object at 0x7fc1b86c2bb0>
cell = {'id': 'c5a56806', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-05T01:08:01.850416Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53, execution_count = None, store_history = True

async def async_execute_cell(
    self,
    cell: NotebookNode,
    cell_index: int,
    execution_count: t.Optional[int] = None,
    store_history: bool = True,
) -> NotebookNode:
    """
    Executes a single code cell.

    To execute all cells see :meth:`execute`.

    Parameters
    ----------
    cell : nbformat.NotebookNode
        The cell which is currently being processed.
    cell_index : int
        The position of the cell within the notebook object.
    execution_count : int
        The execution count to be assigned to the cell (default: Use kernel response)
    store_history : bool
        Determines if history should be stored in the kernel (default: False).
        Specific to ipython kernels, which can store command histories.

    Returns
    -------
    output : dict
        The execution output payload (or None for no output).

    Raises
    ------
    CellExecutionError
        If execution failed and should raise an exception, this will be raised
        with defaults about the failure.

    Returns
    -------
    cell : NotebookNode
        The cell which was just processed.
    """
    assert self.kc is not None

    await run_hook(self.on_cell_start, cell=cell, cell_index=cell_index)

    if cell.cell_type != 'code' or not cell.source.strip():
        self.log.debug("Skipping non-executing cell %s", cell_index)
        return cell

    if self.skip_cells_with_tag in cell.metadata.get("tags", []):
        self.log.debug("Skipping tagged cell %s", cell_index)
        return cell

    if self.record_timing:  # clear execution metadata prior to execution
        cell['metadata']['execution'] = {}

    self.log.debug("Executing cell:\n%s", cell.source)

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors or "raises-exception" in cell.metadata.get("tags", [])
    )

    await run_hook(self.on_cell_execute, cell=cell, cell_index=cell_index)
    parent_msg_id = await ensure_async(
        self.kc.execute(
            cell.source, store_history=store_history, stop_on_error=not cell_allows_errors
        )
    )
    await run_hook(self.on_cell_complete, cell=cell, cell_index=cell_index)
    # We launched a code cell to execute
    self.code_cells_executed += 1
    exec_timeout = self._get_timeout(cell)

    cell.outputs = []
    self.clear_before_next_output = False

    task_poll_kernel_alive = asyncio.ensure_future(self._async_poll_kernel_alive())
    task_poll_output_msg = asyncio.ensure_future(
        self._async_poll_output_msg(parent_msg_id, cell, cell_index)
    )
    self.task_poll_for_reply = asyncio.ensure_future(
        self._async_poll_for_reply(
            parent_msg_id, cell, exec_timeout, task_poll_output_msg, task_poll_kernel_alive
        )
    )
    try:
        exec_reply = await self.task_poll_for_reply
    except asyncio.CancelledError:
        # can only be cancelled by task_poll_kernel_alive when the kernel is dead
        task_poll_output_msg.cancel()
        raise DeadKernelError("Kernel died")
    except Exception as e:
        # Best effort to cancel request if it hasn't been resolved
        try:
            # Check if the task_poll_output is doing the raising for us
            if not isinstance(e, CellControlSignal):
                task_poll_output_msg.cancel()
        finally:
            raise

    if execution_count:
        cell['execution_count'] = execution_count
    await run_hook(
        self.on_cell_executed, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
  await self._check_raise_for_error(cell, cell_index, exec_reply)

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:1022:


self = <testbook.client.TestbookNotebookClient object at 0x7fc1b86c2bb0>
cell = {'id': 'c5a56806', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-05T01:08:01.850416Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53
exec_reply = {'buffers': [], 'content': {'ename': 'InferenceServerException', 'engine_info': {'engine_id': -1, 'engine_uuid': '33b6...e, 'engine': '33b62233-f3fd-43c9-a40c-54eca6632ccf', 'started': '2022-08-05T01:08:01.850725Z', 'status': 'error'}, ...}

async def _check_raise_for_error(
    self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
) -> None:

    if exec_reply is None:
        return None

    exec_reply_content = exec_reply['content']
    if exec_reply_content['status'] != 'error':
        return None

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors
        or exec_reply_content.get('ename') in self.allow_error_names
        or "raises-exception" in cell.metadata.get("tags", [])
    )
    await run_hook(
        self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
    if not cell_allows_errors:
      raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)

E nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:916: CellExecutionError

During handling of the above exception, another exception occurred:

def test_func():
    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "01-Building-Recommender-Systems-with-Merlin.ipynb",
        execute=False,
    ) as tb1:
        tb1.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["NUM_ROWS"] = "10000"
            os.system("mkdir -p /tmp/examples")
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        tb1.execute()
        assert os.path.isdir("/tmp/examples/dlrm")
        assert os.path.isdir("/tmp/examples/feature_repo")
        assert os.path.isdir("/tmp/examples/query_tower")
        assert os.path.isfile("/tmp/examples/item_embeddings.parquet")
        assert os.path.isfile("/tmp/examples/feature_repo/user_features.py")
        assert os.path.isfile("/tmp/examples/feature_repo/item_features.py")

    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb",
        execute=False,
    ) as tb2:
        tb2.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        NUM_OF_CELLS = len(tb2.cells)
        tb2.execute_cell(list(range(0, NUM_OF_CELLS - 3)))
        top_k = tb2.ref("top_k")
        outputs = tb2.ref("outputs")
        assert outputs[0] == "ordered_ids"
      tb2.inject(
            """
            import shutil
            from merlin.models.loader.tf_utils import configure_tensorflow
            configure_tensorflow()
            from merlin.systems.triton.utils import run_ensemble_on_tritonserver
            response = run_ensemble_on_tritonserver(
                "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
            )
            response = [x.tolist()[0] for x in response["ordered_ids"]]
            shutil.rmtree("/tmp/examples/", ignore_errors=True)
            """
        )

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py:57:


/usr/local/lib/python3.8/dist-packages/testbook/client.py:237: in inject
cell = TestbookNode(self.execute_cell(inject_idx)) if run else TestbookNode(code_cell)


self = <testbook.client.TestbookNotebookClient object at 0x7fc1b86c2bb0>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
            cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)
        except CellExecutionError as ce:
          raise TestbookRuntimeError(ce.evalue, ce, self._get_error_class(ce.ename))

E testbook.exceptions.TestbookRuntimeError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/testbook/client.py:135: TestbookRuntimeError
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
2022-08-05 01:06:20.677109: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-05 01:06:22.648143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-05 01:06:22.648914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.8/logging/init.py", line 2127, in shutdown
h.close()
File "/usr/local/lib/python3.8/dist-packages/absl/logging/init.py", line 934, in close
self.stream.close()
File "/usr/local/lib/python3.8/dist-packages/ipykernel/iostream.py", line 438, in close
self.watch_fd_thread.join()
AttributeError: 'OutStream' object has no attribute 'watch_fd_thread'
WARNING clustering 232 points to 32 centroids: please provide at least 1248 training points
2022-08-05 01:07:54.994646: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-05 01:07:56.961940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-05 01:07:56.962713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
I0805 01:08:02.114323 15356 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7ff6a4000000' with size 268435456
I0805 01:08:02.115069 15356 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0805 01:08:02.122362 15356 model_repository_manager.cc:1191] loading: 0_queryfeast:1
I0805 01:08:02.222687 15356 model_repository_manager.cc:1191] loading: 1_predicttensorflow:1
I0805 01:08:02.229839 15356 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 0)
I0805 01:08:02.323001 15356 model_repository_manager.cc:1191] loading: 2_queryfaiss:1
I0805 01:08:02.423214 15356 model_repository_manager.cc:1191] loading: 3_queryfeast:1
I0805 01:08:02.523471 15356 model_repository_manager.cc:1191] loading: 4_unrollfeatures:1
I0805 01:08:02.623749 15356 model_repository_manager.cc:1191] loading: 5_predicttensorflow:1
I0805 01:08:02.724062 15356 model_repository_manager.cc:1191] loading: 6_softmaxsampling:1
I0805 01:08:04.540782 15356 model_repository_manager.cc:1345] successfully loaded '0_queryfeast' version 1
I0805 01:08:04.820139 15356 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0805 01:08:04.820178 15356 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0805 01:08:04.820185 15356 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0805 01:08:04.820191 15356 tensorflow.cc:2221] backend configuration:
{"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}}
I0805 01:08:04.820226 15356 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 1_predicttensorflow (version 1)
I0805 01:08:04.824038 15356 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 5_predicttensorflow (version 1)
I0805 01:08:04.826052 15356 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 1_predicttensorflow (GPU device 0)
2022-08-05 01:08:05.169694: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-05 01:08:05.173084: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-05 01:08:05.173109: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-05 01:08:05.173201: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-05 01:08:05.208873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12648 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-05 01:08:05.250890: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-05 01:08:05.332384: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-05 01:08:05.356110: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 186436 microseconds.
I0805 01:08:05.356222 15356 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)
I0805 01:08:05.356325 15356 model_repository_manager.cc:1345] successfully loaded '1_predicttensorflow' version 1
I0805 01:08:07.722962 15356 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 3_queryfeast (GPU device 0)
I0805 01:08:07.724664 15356 model_repository_manager.cc:1345] successfully loaded '2_queryfaiss' version 1
I0805 01:08:10.024390 15356 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 4_unrollfeatures (GPU device 0)
I0805 01:08:10.025560 15356 model_repository_manager.cc:1345] successfully loaded '3_queryfeast' version 1
I0805 01:08:12.093607 15356 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 5_predicttensorflow (GPU device 0)
I0805 01:08:12.094010 15356 model_repository_manager.cc:1345] successfully loaded '4_unrollfeatures' version 1
2022-08-05 01:08:12.095230: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-05 01:08:12.113512: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-05 01:08:12.113556: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-05 01:08:12.115580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12648 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-05 01:08:12.137592: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-05 01:08:12.294075: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-05 01:08:12.346454: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 251237 microseconds.
I0805 01:08:12.346601 15356 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 6_softmaxsampling (GPU device 0)
I0805 01:08:12.346695 15356 model_repository_manager.cc:1345] successfully loaded '5_predicttensorflow' version 1
I0805 01:08:14.457636 15356 model_repository_manager.cc:1345] successfully loaded '6_softmaxsampling' version 1
I0805 01:08:14.460560 15356 model_repository_manager.cc:1191] loading: ensemble_model:1
I0805 01:08:14.561367 15356 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1
I0805 01:08:14.561549 15356 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0805 01:08:14.561666 15356 server.cc:583]
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}} |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0805 01:08:14.561780 15356 server.cc:626]
+---------------------+---------+--------+
| Model | Version | Status |
+---------------------+---------+--------+
| 0_queryfeast | 1 | READY |
| 1_predicttensorflow | 1 | READY |
| 2_queryfaiss | 1 | READY |
| 3_queryfeast | 1 | READY |
| 4_unrollfeatures | 1 | READY |
| 5_predicttensorflow | 1 | READY |
| 6_softmaxsampling | 1 | READY |
| ensemble_model | 1 | READY |
+---------------------+---------+--------+

I0805 01:08:14.627974 15356 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0805 01:08:14.628816 15356 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/examples/poc_ensemble |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0805 01:08:14.630060 15356 grpc_server.cc:4589] Started GRPCInferenceService at 0.0.0.0:8001
I0805 01:08:14.630598 15356 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0805 01:08:14.671804 15356 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W0805 01:08:15.646361 15356 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0805 01:08:15.646433 15356 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0805 01:08:16.646590 15356 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0805 01:08:16.646644 15356 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0805 01:08:17.664897 15356 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0805 01:08:17.664951 15356 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
0805 01:08:20.377868 15613 pb_stub.cc:749] Failed to process the request(s) for model '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)

Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"

At:
/tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

I0805 01:08:20.378832 15356 server.cc:257] Waiting for in-flight requests to complete.
I0805 01:08:20.378878 15356 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0805 01:08:20.378897 15356 model_repository_manager.cc:1223] unloading: ensemble_model:1
I0805 01:08:20.378991 15356 model_repository_manager.cc:1223] unloading: 6_softmaxsampling:1
I0805 01:08:20.379058 15356 model_repository_manager.cc:1223] unloading: 5_predicttensorflow:1
I0805 01:08:20.379149 15356 model_repository_manager.cc:1223] unloading: 4_unrollfeatures:1
I0805 01:08:20.379160 15356 model_repository_manager.cc:1328] successfully unloaded 'ensemble_model' version 1
I0805 01:08:20.379210 15356 model_repository_manager.cc:1223] unloading: 3_queryfeast:1
I0805 01:08:20.379253 15356 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0805 01:08:20.379298 15356 model_repository_manager.cc:1223] unloading: 2_queryfaiss:1
I0805 01:08:20.379365 15356 model_repository_manager.cc:1223] unloading: 1_predicttensorflow:1
I0805 01:08:20.379377 15356 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0805 01:08:20.379442 15356 model_repository_manager.cc:1223] unloading: 0_queryfeast:1
I0805 01:08:20.379487 15356 server.cc:288] All models are stopped, unloading models
I0805 01:08:20.379510 15356 server.cc:295] Timeout 30: Found 7 live models and 0 in-flight non-inference requests
I0805 01:08:20.379558 15356 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0805 01:08:20.379669 15356 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0805 01:08:20.391350 15356 model_repository_manager.cc:1328] successfully unloaded '1_predicttensorflow' version 1
I0805 01:08:20.399540 15356 model_repository_manager.cc:1328] successfully unloaded '5_predicttensorflow' version 1
I0805 01:08:21.379635 15356 server.cc:295] Timeout 29: Found 5 live models and 0 in-flight non-inference requests
I0805 01:08:21.760605 15356 model_repository_manager.cc:1328] successfully unloaded '6_softmaxsampling' version 1
I0805 01:08:21.844817 15356 model_repository_manager.cc:1328] successfully unloaded '4_unrollfeatures' version 1
I0805 01:08:22.016759 15356 model_repository_manager.cc:1328] successfully unloaded '2_queryfaiss' version 1
I0805 01:08:22.379857 15356 server.cc:295] Timeout 28: Found 2 live models and 0 in-flight non-inference requests
I0805 01:08:23.379994 15356 server.cc:295] Timeout 27: Found 2 live models and 0 in-flight non-inference requests
I0805 01:08:24.380145 15356 server.cc:295] Timeout 26: Found 2 live models and 0 in-flight non-inference requests
I0805 01:08:25.380280 15356 server.cc:295] Timeout 25: Found 2 live models and 0 in-flight non-inference requests
I0805 01:08:26.380419 15356 server.cc:295] Timeout 24: Found 2 live models and 0 in-flight non-inference requests
I0805 01:08:27.380554 15356 server.cc:295] Timeout 23: Found 2 live models and 0 in-flight non-inference requests
I0805 01:08:28.380690 15356 server.cc:295] Timeout 22: Found 2 live models and 0 in-flight non-inference requests
I0805 01:08:29.380823 15356 server.cc:295] Timeout 21: Found 2 live models and 0 in-flight non-inference requests
I0805 01:08:30.380959 15356 server.cc:295] Timeout 20: Found 2 live models and 0 in-flight non-inference requests
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0805 01:08:30.807688 15356 model_repository_manager.cc:1328] successfully unloaded '0_queryfeast' version 1
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0805 01:08:31.276899 15356 model_repository_manager.cc:1328] successfully unloaded '3_queryfeast' version 1
I0805 01:08:31.381090 15356 server.cc:295] Timeout 19: Found 0 live models and 0 in-flight non-inference requests
=========================== short test summary info ============================
FAILED tests/unit/examples/test_building_deploying_multi_stage_RecSys.py::test_func
=================== 1 failed, 2 passed in 239.47s (0:03:59) ====================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://github.com/gitapi/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins1131940387351159900.sh

@mikemckiernan
Copy link
Member

rerun tests

@nvidia-merlin-bot
Copy link
Contributor Author

Click to view CI Results
GitHub pull request #500 of commit 371f0d16520f98f994078662ba619f081a2d6fe7, no merge conflicts.
Running as SYSTEM
Setting status of 371f0d16520f98f994078662ba619f081a2d6fe7 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/315/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/500/*:refs/remotes/origin/pr/500/* # timeout=10
 > git rev-parse 371f0d16520f98f994078662ba619f081a2d6fe7^{commit} # timeout=10
Checking out Revision 371f0d16520f98f994078662ba619f081a2d6fe7 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 371f0d16520f98f994078662ba619f081a2d6fe7 # timeout=10
Commit message: "Updates from containers"
 > git rev-list --no-walk 371f0d16520f98f994078662ba619f081a2d6fe7 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins1937438013079136360.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py . [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py . [100%]

======================== 3 passed in 238.44s (0:03:58) =========================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://github.com/gitapi/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins8457370189773055265.sh

@viswa-nvidia viswa-nvidia added this to the Merlin 22.08 milestone Aug 5, 2022
@benfred benfred merged commit af13180 into main Aug 5, 2022
@mikemckiernan mikemckiernan deleted the docs-smx-2206-1 branch November 16, 2022 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants