Skip to content

Commit

Permalink
Merge branch 'PMPY-2097' into 'integration'
Browse files Browse the repository at this point in the history
PMPY-2097 Additional OpenAI queries

Closes PMPY-2097

See merge request process-mining/pm4py/pm4py-core!996
  • Loading branch information
fit-daniel-schuster committed May 8, 2023
2 parents 5140f4b + 08bf7c5 commit 3e88d92
Show file tree
Hide file tree
Showing 10 changed files with 791 additions and 549 deletions.
36 changes: 5 additions & 31 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -293,29 +293,14 @@ Some object-centric process discovery algorithms are also offered:
OpenAI Integration (:mod:`pm4py.openai`)
------------------------------------------

We offer some integrations with OpenAI (e.g., ChatGPT) for automatically get insights:

* :meth:`pm4py.openai.describe_process`; provides domain knowledge about the process
* :meth:`pm4py.openai.describe_path`; provides domain knowledge about a path of the process
* :meth:`pm4py.openai.describe_activity`; provides domain knowledge about an activity of the process
* :meth:`pm4py.openai.describe_variant`; describes a given variant, providing insights on the anomalies
* :meth:`pm4py.openai.suggest_improvements`; suggests some improvements for the process starting from its event log
* :meth:`pm4py.openai.root_cause_analysis`; performs a root cause analysis of the conformance/performance issues
* :meth:`pm4py.openai.code_for_log_generation`; generates an event log given the name of a process (e.g., Purchase-to-Pay)
* :meth:`pm4py.openai.compare_logs`; describes the differences between two event logs
* :meth:`pm4py.openai.anomaly_detection`; describes the main anomalies of the provided event log
* :meth:`pm4py.openai.suggest_clusters`; suggest groups of variants based on the behavior
* :meth:`pm4py.openai.conformance_checking`; performs conformance checking against the provided log and rule
* :meth:`pm4py.openai.suggest_verify_hypotheses`; given an event log, provides some hypotheses for the analysis and allows to verify them
* :meth:`pm4py.openai.filtering_query`; given an event log and a natural language query, translates that to a SQL query


The following methods provides just the abstractions of the given objects:

* :meth:`pm4py.openai.abstract_dfg`; provides the DFG abstraction of a traditional event log
* :meth:`pm4py.openai.abstract_variants`; provides the variants abstraction of a traditional event log
* :meth:`pm4py.openai.abstract_log_attributes`; provides the abstraction of the attributes/columns of the event log
* :meth:`pm4py.openai.abstract_ocel`; provides the abstraction of an object-centric event log
* :meth:`pm4py.openai.abstract_ocel`; provides the abstraction of an object-centric event log (list of events and objects)
* :meth:`pm4py.openai.abstract_ocel_ocdfg`; provides the abstraction of an object-centric event log (OC-DFG)
* :meth:`pm4py.openai.abstract_ocel_features`; provides the abstraction of an object-centric event log (features for ML)
* :meth:`pm4py.openai.abstract_event_stream`; provides an abstraction of the (last) events of the stream related to a traditional event log
* :meth:`pm4py.openai.abstract_petri_net`; provides the abstraction of a Petri net

Expand Down Expand Up @@ -575,25 +560,14 @@ Overall List of Methods
pm4py.ocel.ocel_e2o_lifecycle_enrichment
pm4py.ocel.cluster_equivalent_ocel
pm4py.openai
pm4py.openai.describe_process
pm4py.openai.describe_path
pm4py.openai.describe_activity
pm4py.openai.suggest_improvements
pm4py.openai.code_for_log_generation
pm4py.openai.root_cause_analysis
pm4py.openai.describe_variant
pm4py.openai.compare_logs
pm4py.openai.abstract_dfg
pm4py.openai.abstract_variants
pm4py.openai.abstract_ocel
pm4py.openai.anomaly_detection
pm4py.openai.suggest_clusters
pm4py.openai.conformance_checking
pm4py.openai.suggest_verify_hypotheses
pm4py.openai.abstract_ocel_ocdfg
pm4py.openai.abstract_ocel_features
pm4py.openai.abstract_event_stream
pm4py.openai.abstract_petri_net
pm4py.openai.abstract_log_attributes
pm4py.openai.filtering_query
pm4py.connectors.extract_log_outlook_mails
pm4py.connectors.extract_log_outlook_calendar
pm4py.connectors.extract_log_windows_events
Expand Down
8 changes: 1 addition & 7 deletions examples/execute_everything.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,6 @@ def ocel_enrichment():
ocel_enrichment.execute_script()


def openai_log_queries():
from examples import openai_log_queries
print("\n\nopenai_log_queries")
openai_log_queries.execute_script()


def validation_ocel20_xml():
from examples import validation_ocel20_xml
print("\n\nvalidation_ocel20_xml")
Expand Down Expand Up @@ -781,7 +775,7 @@ def execute_script(f):
execute_script(ocel_occm_example)
execute_script(ocel_clustering)
execute_script(ocel_enrichment)
execute_script(openai_log_queries)
execute_script(openai_queries)
execute_script(validation_ocel20_xml)
execute_script(consecutive_act_case_grouping_filter)
execute_script(cost_based_dfg)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
from typing import Optional, Dict, Any, Collection
import pandas as pd
from pm4py.objects.log.obj import EventLog, EventStream
from pm4py.objects.ocel.obj import OCEL
from pm4py.algo.querying.openai import log_to_dfg_descr, log_to_variants_descr, log_to_cols_descr
from pm4py.algo.querying.openai import stream_to_descr
from pm4py.algo.transformation.ocel.description import algorithm as ocel_description
from pm4py.algo.querying.openai import ocel_ocdfg_descr, ocel_fea_descr
from pm4py.algo.querying.openai import perform_query
from pm4py.objects.conversion.log import converter as log_converter
from typing import Union, Tuple
Expand All @@ -18,10 +22,12 @@ class Parameters(Enum):

AVAILABLE_LOG_QUERIES = ["describe_process", "describe_path", "describe_activity", "suggest_improvements", "code_for_log_generation",
"root_cause_analysis", "describe_variant", "compare_logs", "anomaly_detection", "suggest_clusters",
"conformance_checking", "suggest_verify_hypotheses", "filtering_query"]
"conformance_checking", "suggest_verify_hypotheses", "filtering_query",
"abstract_dfg", "abstract_variants", "abstract_columns", "abstract_ocel", "abstract_stream",
"abstract_ocel_ocdfg", "abstract_ocel_features"]


def query_wrapper(log_obj: Union[pd.DataFrame, EventLog, EventStream], type: str, args: Optional[Dict[Any, Any]] = None, parameters: Optional[Dict[Any, Any]] = None) -> str:
def query_wrapper(log_obj: Union[pd.DataFrame, EventLog, EventStream, OCEL], type: str, args: Optional[Dict[Any, Any]] = None, parameters: Optional[Dict[Any, Any]] = None) -> str:
if parameters is None:
parameters = {}

Expand All @@ -36,6 +42,8 @@ def query_wrapper(log_obj: Union[pd.DataFrame, EventLog, EventStream], type: str
return describe_activity(log_obj, args["activity"], parameters=parameters)
elif type == "suggest_improvements":
return suggest_improvements(log_obj, parameters=parameters)
elif type == "anomalous_paths":
return anomalous_paths(log_obj, parameters=parameters)
elif type == "code_for_log_generation":
return code_for_log_generation(args["desired_process"], parameters=parameters)
elif type == "root_cause_analysis":
Expand All @@ -54,6 +62,20 @@ def query_wrapper(log_obj: Union[pd.DataFrame, EventLog, EventStream], type: str
return suggest_verify_hypotheses(log_obj, parameters=parameters)
elif type == "filtering_query":
return filtering_query(log_obj, args["query"], parameters=parameters)
elif type == "abstract_dfg":
return log_to_dfg_descr.apply(log_obj, parameters=parameters)
elif type == "abstract_variants":
return log_to_variants_descr.apply(log_obj, parameters=parameters)
elif type == "abstract_columns":
return log_to_cols_descr.apply(log_obj, parameters=parameters)
elif type == "abstract_ocel":
return ocel_description.apply(log_obj, parameters=parameters)
elif type == "abstract_stream":
return stream_to_descr.apply(log_obj, parameters=parameters)
elif type == "abstract_ocel_ocdfg":
return ocel_ocdfg_descr.apply(log_obj, parameters=parameters)
elif type == "abstract_ocel_features":
return ocel_fea_descr.apply(log_obj, args["type"], parameters=parameters)


def describe_process(log_obj: Union[pd.DataFrame, EventLog, EventStream], parameters: Optional[Dict[Any, Any]] = None) -> str:
Expand Down Expand Up @@ -132,6 +154,25 @@ def suggest_improvements(log_obj: Union[pd.DataFrame, EventLog, EventStream], pa
return perform_query.apply(query, parameters=parameters)


def anomalous_paths(log_obj: Union[pd.DataFrame, EventLog, EventStream], parameters: Optional[Dict[Any, Any]] = None) -> str:
if parameters is None:
parameters = {}

log_obj = log_converter.apply(log_obj, variant=log_converter.Variants.TO_DATA_FRAME, parameters=parameters)
activity_key = exec_utils.get_param_value(Parameters.ACTIVITY_KEY, parameters, xes_constants.DEFAULT_NAME_KEY)

api_key = exec_utils.get_param_value(Parameters.API_KEY, parameters, constants.OPENAI_API_KEY)
execute_query = exec_utils.get_param_value(Parameters.EXECUTE_QUERY, parameters, api_key is not None)

query = log_to_dfg_descr.apply(log_obj, parameters=parameters)
query += "which are the paths, included in the directly-follows graph, that looks more anomalous? Could you also explain why they are anomalous? Please only data and process specific considerations, not general considerations."

if not execute_query:
return query

return perform_query.apply(query, parameters=parameters)


def code_for_log_generation(desired_process: str, parameters: Optional[Dict[Any, Any]] = None) -> str:
if parameters is None:
parameters = {}
Expand Down
60 changes: 60 additions & 0 deletions examples/openai/net_queries.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
from pm4py.objects.petri_net.obj import PetriNet, Marking
from typing import Optional, Dict, Any
from pm4py.algo.querying.openai import net_to_descr
from pm4py.algo.querying.openai import perform_query
from enum import Enum
from pm4py.util import exec_utils, constants


class Parameters(Enum):
EXECUTE_QUERY = "execute_query"
API_KEY = "api_key"
EXEC_RESULT = "exec_result"


AVAILABLE_QUERIES = ["petri_diff_with_de_jure", "petri_describe_process"]


def query_wrapper(net: PetriNet, im: Marking, fm: Marking, type: str, args: Optional[Dict[Any, Any]] = None, parameters: Optional[Dict[Any, Any]] = None) -> str:
if parameters is None:
parameters = {}

if args is None:
args = {}

if type == "petri_diff_with_de_jure":
return petri_diff_with_de_jure(net, im, fm, parameters=parameters)
elif type == "petri_describe_process":
return petri_describe_process(net, im, fm, parameters=parameters)


def petri_diff_with_de_jure(net: PetriNet, im: Marking, fm: Marking, parameters: Optional[Dict[Any, Any]] = None) -> str:
if parameters is None:
parameters = {}

api_key = exec_utils.get_param_value(Parameters.API_KEY, parameters, constants.OPENAI_API_KEY)
execute_query = exec_utils.get_param_value(Parameters.EXECUTE_QUERY, parameters, api_key is not None)

query = net_to_descr.apply(net, im, fm, parameters=parameters)
query += "what are the differences of this process model with what would you expect (de-jure model) for the same process? Please only provide data or process-specific information, i.e., if the context is insufficient please not report any general consideration"

if not execute_query:
return query

return perform_query.apply(query, parameters=parameters)


def petri_describe_process(net: PetriNet, im: Marking, fm: Marking, parameters: Optional[Dict[Any, Any]] = None) -> str:
if parameters is None:
parameters = {}

api_key = exec_utils.get_param_value(Parameters.API_KEY, parameters, constants.OPENAI_API_KEY)
execute_query = exec_utils.get_param_value(Parameters.EXECUTE_QUERY, parameters, api_key is not None)

query = net_to_descr.apply(net, im, fm, parameters=parameters)
query += "could you describe the process represented in this Petri net?"

if not execute_query:
return query

return perform_query.apply(query, parameters=parameters)
53 changes: 0 additions & 53 deletions examples/openai_log_queries.py

This file was deleted.

1 change: 1 addition & 0 deletions pm4py/algo/querying/openai/net_to_descr.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,5 +40,6 @@ def apply(net: PetriNet, im: Marking, fm: Marking, parameters: Optional[Dict[Any
ret.append(repr(net))
ret.append("\ninitial marking: "+repr(im))
ret.append("final marking: "+repr(fm))
ret.append("\n")

return "\n".join(ret)
101 changes: 101 additions & 0 deletions pm4py/algo/querying/openai/ocel_fea_descr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
from pm4py.objects.ocel.obj import OCEL
from typing import Optional, Dict, Any
from pm4py.util import exec_utils, constants, xes_constants
from enum import Enum
import numpy as np


class Parameters(Enum):
INCLUDE_HEADER = "include_header"
MAX_LEN = "max_len"


def __transform_to_string(stru: str) -> str:
if stru.startswith("@@ocel_lif_activity_"):
return "Number of occurrences of the activity "+stru.split("@@ocel_lif_activity_")[1]
elif stru.startswith("@@object_lifecycle_unq_act"):
return "Number of unique activities in the lifecycle of the object"
elif stru.startswith("@@object_lifecycle_length"):
return "Number of events in the lifecycle of the object"
elif stru.startswith("@@object_lifecycle_duration"):
return "Duration of the lifecycle of the object"
elif stru.startswith("@@object_lifecycle_start_timestamp"):
return "Start timestamp of the lifecycle of the object"
elif stru.startswith("@@object_lifecycle_end_timestamp"):
return "Completion timestamp of the lifecycle of the object"
elif stru.startswith("@@object_degree_centrality"):
return "Degree centrality of the object in the object interaction graph"
elif stru.startswith("@@object_general_interaction_graph"):
return "Number of objects related in the object interaction graph"
elif stru.startswith("@@object_general_descendants_graph_descendants"):
return "Number of objects which follow the current object in the object descendants graph"
elif stru.startswith("@@object_general_inheritance_graph_ascendants"):
return "Number of objects which follow the current object in the object inheritance graph"
elif stru.startswith("@@object_general_descendants_graph_ascendants"):
return "Number of objects which precede the current object in the object descendants graph"
elif stru.startswith("@@object_general_inheritance_graph_descendants"):
return "Number of objects which precede the current object in the object descendants graph"
elif stru.startswith("@@object_cobirth"):
return "Number of objects starting their lifecycle together with the current object"
elif stru.startswith("@@object_codeath"):
return "Number of objects ending their lifecycle together with the current object"
elif stru.startswith("@@object_interaction_graph_"):
return "Number of object of type "+stru.split("@@object_interaction_graph_")[1]+" related to the current object in the object interaction graph"
elif stru.startswith("@@ocel_lif_path_"):
path = stru.split("@@ocel_lif_path_")[1]
act1 = path.split("##")[0]
act2 = path.split("##")[1]
return "Frequency of the path \""+act1+"\" -> \""+act2+"\" in the lifecycle of the object"

print(stru)
return None


def apply(ocel: OCEL, obj_type: str, parameters: Optional[Dict[Any, Any]] = None) -> str:
if parameters is None:
parameters = {}

include_header = exec_utils.get_param_value(Parameters.INCLUDE_HEADER, parameters, True)
max_len = exec_utils.get_param_value(Parameters.MAX_LEN, parameters, constants.OPENAI_MAX_LEN)

import pm4py

fea_df = pm4py.extract_ocel_features(ocel, obj_type, include_obj_id=False)

cols = []

for c in fea_df.columns:
ser = fea_df[c]
ser1 = ser[ser != 0]
if len(ser1) > 0:
desc = __transform_to_string(c)
avg = np.average(ser1)
stdavg = 0 if avg == 0 or len(ser1) == 1 else np.std(ser1)/avg
cols.append([desc, len(ser1), stdavg, ser1])

cols = sorted(cols, key=lambda x: (x[1], x[2], x[0]), reverse=True)

ret = ["\n"]

if include_header:
ret.append("Beforehand, a bit of notions.")
ret.append("Given an object-centric event log, the object interaction graph connects objects that are related in at least an event.")
ret.append("The object descendants graph connects objects related in at least an event, when the lifecycle of the second object starts after the lifecycle of the first.")
ret.append("The object inheritance graph connects objects when there an event that ends the lifecycle of the first object and starts the lifecycle of the second one.")
ret.append("\n\n")
ret.append("Given the following features:\n\n")

ret = " ".join(ret)

i = 0
while i < len(cols):
if len(ret) >= max_len:
break

stru = cols[i][0]+": number of non-zero values: "+str(cols[i][1])+" ; quantiles of the non-zero: "+str(cols[i][3].quantile([0.0, 0.25, 0.5, 0.75, 1.0]).to_dict())+"\n"
ret = ret + stru

i = i + 1

return ret

Loading

0 comments on commit 3e88d92

Please sign in to comment.