Updating examples for security tags (#3224)

* updating examples * examples added * spellcheck addition
pytorch · Jul 3, 2024 · 589b351 · 589b351
1 parent 55bcf9d
commit 589b351
Show file tree

Hide file tree

Showing 58 changed files with 201 additions and 174 deletions.
diff --git a/examples/FasterTransformer_HuggingFace_Bert/README.md b/examples/FasterTransformer_HuggingFace_Bert/README.md
@@ -1,16 +1,16 @@
-## Faster Transformer 
+## Faster Transformer
 
 Batch inferencing with Transformers faces two challenges
 
-- Large batch sizes suffer from higher latency and small or medium-sized batches this will become kernel latency launch bound. 
+- Large batch sizes suffer from higher latency and small or medium-sized batches this will become kernel latency launch bound.
 - Padding wastes a lot of compute, (batchsize, seq_length) requires to pad the sequence to (batchsize, max_length) where difference between avg_length and max_length results in a considerable waste of computation, increasing the batch size worsen this situation.
 
 [Faster Transformers](https://github.com/NVIDIA/FasterTransformer/blob/main/examples/pytorch/bert/run_glue.py) (FT) from Nvidia along with [Efficient Transformers](https://github.com/bytedance/effective_transformer) (EFFT) that is built on top of FT address the above two challenges, by fusing the CUDA kernels and dynamically removing padding during computations. The current implementation from [Faster Transformers](https://github.com/NVIDIA/FasterTransformer/blob/main/examples/pytorch/bert/run_glue.py) support BERT like encoder and decoder layers. In this example, we show how to get a Torchscripted (traced) EFFT variant of Bert models from HuggingFace (HF) for sequence classification and question answering and serve it.
 
 
 ### How to get a Torchscripted (Traced) EFFT of HF Bert model and serving it
 
-**Requirements** 
+**Requirements**
 
 Running Faster Transformer at this point is recommended through [NVIDIA docker and NGC container](https://github.com/NVIDIA/FasterTransformer#requirements), also it requires [Volta](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/) or [Turing](https://www.nvidia.com/en-us/geforce/turing/) or [Ampere](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/) based GPU. For this example we have used a **g4dn.2xlarge** EC2 instance that has a T4 GPU.
 
@@ -34,9 +34,9 @@ mkdir -p build
 
 cd build
 
-cmake -DSM=75 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON ..   # -DSM = 70 for V100 gpu ------- 60 (P40) or 61 (P4) or 70 (V100) or 75(T4) or 80 (A100), 
+cmake -DSM=75 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON ..   # -DSM = 70 for V100 gpu ------- 60 (P40) or 61 (P4) or 70 (V100) or 75(T4) or 80 (A100),
 
-make 
+make
 
 pip install transformers==2.5.1
 
@@ -45,8 +45,8 @@ cd /workspace
 # clone Torchserve to access examples
 git clone https://github.com/pytorch/serve.git
 
-# install torchserve 
-cd serve 
+# install torchserve
+cd serve
 
 pip install -r requirements/common.txt
 
@@ -99,7 +99,7 @@ mkdir model_store
 
 mv BERTSeqClassification.mar model_store/
 
-torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ncs
+torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ncs --disable-token-auth  --enable-model-api
 
 curl -X POST http://127.0.0.1:8080/predictions/my_tc -T ../Huggingface_Transformers/Seq_classification_artifacts/sample_text_captum_input.txt
 
@@ -132,7 +132,7 @@ cd /workspace/FasterTransformer/build/
 # --data_type can be fp16 or fp32
 python pytorch/Bert_FT_trace.py --mode question_answering --model_name_or_path "/workspace/serve/Transformer_model" --tokenizer_name "bert-base-uncased" --batch_size 1 --data_type fp16 --model_type thsext
 
-cd - 
+cd -
 
 # make sure to change the ../Huggingface_Transformers/setup_config.json "save_mode":"torchscript"
 
@@ -142,10 +142,10 @@ mkdir model_store
 
 mv BERTQA.mar model_store/
 
-torchserve --start --model-store model_store --models my_tc=BERTQA.mar --ncs
+torchserve --start --model-store model_store --models my_tc=BERTQA.mar --ncs --disable-token-auth  --enable-model-api
 
 curl -X POST http://127.0.0.1:8080/predictions/my_tc -T ../Huggingface_Transformers/QA_artifacts/sample_text_captum_input.txt
 
 ```
 
-#### 
+####
diff --git a/examples/Huggingface_Transformers/README.md b/examples/Huggingface_Transformers/README.md
@@ -114,7 +114,7 @@ To register the model on TorchServe using the above model archive file, we run t
 ```
 mkdir model_store
 mv BERTSeqClassification.mar model_store/
-torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --disable-token --ncs
+torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --disable-token --ncs --disable-token-auth  --enable-model-api
 
 ```
 
@@ -164,7 +164,7 @@ torch-model-archiver --model-name BERTTokenClassification --version 1.0 --serial
 ```
 mkdir model_store
 mv BERTTokenClassification.mar model_store
-torchserve --start --model-store model_store --models my_tc=BERTTokenClassification.mar --disable-token --ncs
+torchserve --start --model-store model_store --models my_tc=BERTTokenClassification.mar --disable-token --ncs --disable-token-auth  --enable-model-api
 ```
 
 ### Run an inference
@@ -208,7 +208,7 @@ torch-model-archiver --model-name BERTQA --version 1.0 --serialized-file Transfo
 ```
 mkdir model_store
 mv BERTQA.mar model_store
-torchserve --start --model-store model_store --models my_tc=BERTQA.mar --disable-token --ncs
+torchserve --start --model-store model_store --models my_tc=BERTQA.mar --disable-token --ncs --disable-token-auth  --enable-model-api
 ```
 ### Run an inference
 To run an inference: `curl -X POST http://127.0.0.1:8080/predictions/my_tc -T QA_artifacts/sample_text_captum_input.txt`
@@ -255,7 +255,7 @@ To register the model on TorchServe using the above model archive file, we run t
 ```
 mkdir model_store
 mv Textgeneration.mar model_store/
-torchserve --start --model-store model_store --models my_tc=Textgeneration.mar --disable-token --ncs
+torchserve --start --model-store model_store --models my_tc=Textgeneration.mar --disable-token --ncs --disable-token-auth  --enable-model-api
 ```
 
 ### Run an inference
@@ -272,7 +272,7 @@ For batch inference the main difference is that you need set the batch size whil
     ```
     mkdir model_store
     mv BERTSeqClassification.mar model_store/
-    torchserve --start --model-store model_store --disable-token --ncs
+    torchserve --start --model-store model_store --disable-token --ncs --disable-token-auth  --enable-model-api
 
     curl -X POST "localhost:8081/models?model_name=BERTSeqClassification&url=BERTSeqClassification.mar&batch_size=4&max_batch_delay=5000&initial_workers=3&synchronous=true"
     ```
@@ -297,7 +297,7 @@ For batch inference the main difference is that you need set the batch size whil
      ```
     mkdir model_store
     mv BERTSeqClassification.mar model_store/
-    torchserve --start --model-store model_store --ts-config config.properties --models BERTSeqClassification= BERTSeqClassification.mar
+    torchserve --start --model-store model_store --ts-config config.properties --models BERTSeqClassification= BERTSeqClassification.mar --disable-token-auth  --enable-model-api
 
     ```
 Now to run the batch inference following command can be used:
@@ -377,7 +377,7 @@ To register the model on TorchServe using the above model archive file, we run t
 ```
 mkdir model_store
 mv Textgeneration.mar model_store/
-torchserve --start --model-store model_store --disable-token
+torchserve --start --model-store model_store --disable-token --disable-token-auth  --enable-model-api
 curl -X POST "localhost:8081/models?model_name=Textgeneration&url=Textgeneration.mar&batch_size=1&max_batch_delay=5000&initial_workers=1&synchronous=true"
 ```
 

diff --git a/examples/LLM/llama/chat_app/torchserve_server_app.py b/examples/LLM/llama/chat_app/torchserve_server_app.py
@@ -10,7 +10,9 @@
 
 
 def start_server():
-    os.system("torchserve --start --model-store model_store --ncs")
+    os.system(
+        "torchserve --start --model-store model_store --ncs --disable-token-auth --enable-model-api"
+    )
     st.session_state.started = True
     st.session_state.stopped = False
     st.session_state.registered = False