-
Notifications
You must be signed in to change notification settings - Fork 0
/
CSRE System Pseudo-Code.txt
48 lines (41 loc) · 1.67 KB
/
CSRE System Pseudo-Code.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
CSRE System Pseudo-Code
Init:
CSRE_Init()
ImportLibraries(logging, SentenceTransformer)
LoggingSetup(level=logging.INFO)
Segmentation (SGMT):
Input: Document_Text
Execute: Segmenter(method=sentence_splitting).segment(Document_Text)
Output: Segmented_Sentences
FeatureExtraction (FE):
Input: Segmented_Sentences
Execute: FeatureExtractor(extractor=tfidf_extractor).extract(Segmented_Sentences)
Output: Features_Extracted
VectorEncoding (VE):
Input: Features_Extracted
Execute: VectorEncoder(encoder=doc2vec_encoder).encode(Features_Extracted)
Output: Encoded_Vectors
VectorRepresentation (VR):
Input: Encoded_Vectors
Execute: VectorRepresenter(embedding_method=pca_reduction).embed(Encoded_Vectors)
Output: Embedded_Vectors
Aggregation (AGG):
Input: Embedded_Vectors
Execute: Aggregator(aggregation_method=mean_aggregation).aggregate(Embedded_Vectors)
Output: Document_Vector
Output:
Return: Document_Vector
Component Specifications
Segmenter: Splits document into sentences.
FeatureExtractor: Extracts features (e.g., TF-IDF) from each sentence.
VectorEncoder: Encodes the features into vectors using a method like doc2vec.
VectorRepresenter: Embeds vectors into a lower-dimensional space (e.g., PCA).
Aggregator: Aggregates the embeddings to form a single vector representation of the document.
Example Execution Flow
Input: "Hello world. How are you?"
Segmentation: ["Hello world", "How are you"]
Feature Extraction: [Feature_Vector1, Feature_Vector2]
Vector Encoding: [Encoded_Vector1, Encoded_Vector2]
Vector Representation: [Embedded_Vector1, Embedded_Vector2]
Aggregation: Document_Vector
Output: Document_Vector (Numerical representation of the document)