-
Notifications
You must be signed in to change notification settings - Fork 0
/
HoT_pres_quarto_html_EN.qmd
726 lines (487 loc) · 27.5 KB
/
HoT_pres_quarto_html_EN.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
---
author: "CatalystRPA"
title: "Automation of repetitive tasks"
subtitle: "Using classical programming and Robotic Process Automation"
format:
html:
embed-resources: true
editor: source
date: last-modified
toc: true
title-block-banner: "#6699ff"
# title-block-banner: images/hot.png
title-block-banner-color: "#ffffff"
published-title: "Last Published"
editor_options:
chunk_output_type: console
---
![](images/hot.png)
![](images/DLH.png)
# Introduction
Even though we are nowadays giving more and more importance to data and data-related tasks, they are absolutely nothing new. In this lesson we'll give you an overview of how to automate such repetitive tasks using either "classical" programming languages, as well as Robotic Process Automation tools.
![](images/barley.png)
## Spreadsheet software
More recently, Excel and other spreadsheets were a huge revolution for productivity, and was a cornerstone of the early Mac and Windows platforms..
![we've come a long way](images/bill.png)
The next step after excel is the widespread use of programmatic tools directly, which allows to expand the range of automation tasks compared to Excel, such as interacting with the User Interface of applications.
## Programming, scripting, and Robotic Process Automation
Programming knowledge is becoming ever more widespread, and we see many users of Excel migrating towards, mostly, Python and R.
Some call these users "Citizen Developers".
The main use of these tools is for *scripting*, but can be used for programming robust solutions as well.
*Scripting* aims at developing solutions that can immediately be valuable, either through automation or data exploration. It is most commonly done using the functional programming paradigm rather than object-oriented, although that of course differs from developer to developer. Functional programs decompose complex problems into simple *functions*, while Object-Oriented programs will use simple *objects*. What we we will showcase in this course is functional programming.
from [functional languages](https://adv-r.hadley.nz/fp.html)
> Recently, functional techniques have experienced a surge in interest because they can produce efficient and elegant solutions to many modern problems. A functional style tends to create functions that can easily be analysed in isolation (i.e. using only local information), and hence is often much easier to automatically optimise or parallelise. The traditional weaknesses of functional languages, poorer performance and sometimes unpredictable memory usage, have been much reduced in recent years. Functional programming is complementary to object-oriented programming, which has been the dominant programming paradigm for the last several decades.
Python and R are multi-paradigm languages, supporting both of these concepts.
### Which tools to choose for an automation task?
Regardless of the paradigm you choose, the best way to automate a task using a business application is through interacting with the back-end and / or the data directly, without relying on the User Interface. This can be done with stored procedures for exemple in databases, or using scripting languages.
However, there are cases where doing so is not possible, like for example when you only have access to the graphical user interface of a program like SAP, or a business web application.
In these cases, we have to rely on the UI, and the good news is that even though this method is slower and less reliable, we can indeed programmatically interact with most UI's.
This is where you will start building robots (bots) to do such tasks.
::: {.callout-caution}
Some websites will put measures in place to detect and deny access to such robots. Typically, for example, betting and social media websites will use a CAPTCHA system (*Completely Automated Public Turing test to tell Computers and Humans Apart*) to deny access, and will ban users identified as bots. Applications such as Poker room clients, for example, will also include active protection against spying on their UI.
:::
Some specific tools are available to build such bots that can interact with UI elements.
1. *Browser automation libraries* such as [Selenium](https://www.selenium.dev/), available in Python, and [Rvest](https://rvest.tidyverse.org/) in R, allow to drive *browsers*, sending and retrieving data from the UI of Chrome, Edge, Firefox, Safari and so on.
2. *Robotic process Automation* tools are special programs in which users can, using *visual programming*, create programs that are able to automate the UI's of both web browsers and locally-installed target software such as the SAP client, for example.
How to choose?
- If you can avoid using an UI in your automation tasks, do it. Always favor interacting with the back-ends or the data directly as a first choice, for better consistency, speed and reliability.
- If you need to interact with an UI because, for example, you don't have access to the application back-end or data, then:
- If your target application is a web browser, you can choose between a classical programming library such as Selenium, and a RPA tool. Ultimately, the choice is yours, though we do recommend using a classical library. If you prefer using visual programming, use RPA tools (although selenium offers a browser add-on to simplify the task of creating robots with minimal programming. See [Selenium IDE](https://www.selenium.dev/selenium-ide/))
- If your target application is a locally-installed client, you need to use RPA.
```{mermaid}
flowchart TD
UI{Do you need<br> to interact<br> with an UI?} --> Yes(Yes)
UI --> No(No)
No --> programming(Use programming)
Yes --> webb{Is the target app<br> a web browser?}
webb --> yesweb(Yes)
webb --> noweb(No)
noweb --> rpa(use an RPA tool)
yesweb --> visu{Do you really <br>prefer visual <br>programming?}
visu --> okvisu(I need visual<br> programming)
okvisu --> rpa
visu --> oknovisu(No, it's OK to use code)
oknovisu --> sel(Use a browser automation library such as Selenium with python, or Rvest in R)
```
## Tools used in this course
The three main tools we will use are:
### R
![](images/R.png)
created in 1993 by Ross Ihaka and Robert Gentleman in Auckland. It is free to use, even for commercial purposes. Multitask, great for scripting.
The company behind the popular RStudio IDE recently changed its name to Posit, asserting its strategy of developing tools for both R and Python.
::: {.callout-tip}
In this course, we will use the RStudio IDE to author and run both R and python scripts.
:::
![New name for RStudio](images/posit2.png){fig-align="left"}
For a more in-depth introduction of R, we suggest [https://r4ds.hadley.nz/](R for data science) by Posit's chief scientist, [Hadley Wickham](https://hadley.nz/).
![Hadley Wickham](images/hadley.jpg){fig-align="left"}
### Python
OOP and functional language created by the Dutchman Guido van Rossum in 1991
![](images/python.png)
Like R, it is also fully Open-source and multi-platform.It natively supports the powerful Selenium library which allows the creation of web browser automation bots.
### UIPath
Romanian firm founded in 2005 in Bucharest by Daniel Dines and Marius Tîrcă.
![](images/uipath.png)
Closed-source, paying software (but a Community Edition is available), which uses Visual Programming. A classic example of what is called « RPA ». These tools allow to **spy** and interact with elements in an User Interface. An example video can be found [here](https://www.uipath.com/learning/video-tutorials/advanced-ui-automation).
Of course, all these tools (R, Python, UIPath) can be combined, to exploit each others' capabilities in a hybrid solution. That's what we will showcase later.
### Note: Python or R?
We think that R is best for scripting, as it is extremely flexible and allows to create scripts very quickly.
Some libraries such as Selenium are best used with python, natively (although R bindings exist with RSelenium).
R is mostly used with functional, rather than Object-oriented, programming, although [many R libraries](https://adv-r.hadley.nz/oo.html) allow to use OP in R. Their number makes the task somewhat harder than in python, since you need to know the specificities of the specific OOP paradigm you want to use. In python, a specific OOP implementation is available natively.
This is a classic example of the difference between those languages. Python imposses a relatively strict framework, and will error out more quickly than R when the behaviour of the program is unexpected. R allows for much more flexibility, and in many cases *guesses* what you're trying to do, which can sometimes lead to a surprising behaviour. In addition, several R packages modify the base R syntax by overloading the base operators, so you have effectively several style of syntax you can use in R, depending on which packages you load. This is because R exposes a very high level of [metaprogramming](https://adv-r.hadley.nz/metaprogramming.html).
We think however that this is a worthwile price to pay for the huge degree of freedom and the quick development possibilities offered by R.
Small example of the difference in behaviour:
```{python, eval = F}
#| code-fold: false
# PYTHON
mycolumn = list(range(5)) #[0, 1, 2, 3, 4]
# implicit loop on a python list with list comprehension
[10/x for x in mycolumn] # ERROR: ZeroDivisionError: division by zero
```
Python throws an error when dividing by 0, whereas R returns `Inf`:
```{r , eval = T}
#| code-fold: false
# R
mycolumn = c(0:4)
# in R, many operations are vectorized natively
10/mycolumn # 10/0 = Inf
```
#### Note: pandas, python and R
As you've seen before, the lack of vectorization in python sometimes calls for more elaborate (*"pythonic"*) syntax to perform simple operations in python. Pandas is a predominant library available in python allowing to alleviate some of that overhead and allows to do in python some vectorized operations that natively exist in R. Its creator [Wes McKinney](https://wesmckinney.com/) recently [joined Posit](https://posit.co/blog/welcome-wes/).
![Wes McKinney, creator of the pandas library for python, recently joined Posit](images/wes.jpg)
On [https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_r.html](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_r.html) we can find a syntax correspondence table between R and pandas.
> Since pandas aims to provide a lot of the data manipulation and analysis functionality that people use R for, this page was started to provide a more detailed look at the R language and its many third party libraries as they relate to pandas.
Enough for the introduction, let's move on to the content.
## Programming for Excel users
Programming may not be any more complicated than using Excel functions. Let's take examples in R:
### The SUMIF excel formula
![](images/excel-sumif.png)
This is how this basic example could be done (you can think about it before unfolding):
in R:
```{r , eval = T}
#| code-fold: false
values <- 1:5
types <- c('A', 'B', 'C', 'A', 'B')
# summing where type == A
sum(values[types=='A']) # 5
```
This syntax is pretty much self-explanatory. Is it really more complicated than Excel formulas? Note the *vectorized* operations in R.
In Python:
```{python , eval = T}
#| code-fold: false
values = list(range(1,6)) # note the index difference vs R
types = ['A', 'B', 'C', 'A', 'B']
# summing where type == A
sum(val for val, t in zip(values, types) if t == 'A')
```
Note that by default python doesn't have vectorized operations. We need to use a few more functions:
- `(val for val, t in zip(values, types) if t == 'A')`: This is a generator expression. It produces the value val from each pair where the type (t) is equal to 'A'. This generator expression generates values lazily, meaning it only computes the next value when it's needed.
- `zip(values, types)`: This function combines the elements of values and types into pairs. It pairs the first element of values with the first element of types, the second element of values with the second element of types, and so on.
This illustrates some differences between the two languages. There are often many ways to perform the same operation.
### The VLOOKUP excel formula
```{R, eval = F}
#| code-fold: false
# Sample data
df1 <- data.frame(ID = c(1, 2, 3, 4),
Name = c("Alice", "Bob", "Charlie", "David"))
df2 <- data.frame(ID = c(1, 2, 3, 4),
Score = c(85, 90, 75, 80))
# VLOOKUP in R
df1$Score <- df2$Score[match(df1$ID, df2$ID)]
#print(df1)
```
Try to use the above code into RStudio and examine the result `df1`.
```{python, eval = F}
import pandas as pd
# Sample data
df1 = pd.DataFrame({'ID': [1, 2, 3, 4],
'Name': ['Alice', 'Bob', 'Charlie', 'David']})
df2 = pd.DataFrame({'ID': [1, 2, 3, 4],
'Score': [85, 90, 75, 80]})
# VLOOKUP in Python with pandas
df1['Score'] = df1['ID'].map(df2.set_index('ID')['Score'])
#print(df1)
```
In RStudio, create a new python script, execute the above code, and examine the result `df1`.
### Exercise 1: combining two files
Let's see how we can explore data using scripts.
In this first exercise, read 2 csv files as dataframes, and combine them together by appending the rows of the second one to the first one.
```{R, eval = T, echo=F}
a <- read.csv(r"[C:\CatalystRPA_automatisation_HoT-main\Exercices\Exercice 2\Resolution\R\A.csv]")
b <- read.csv(r"[C:\CatalystRPA_automatisation_HoT-main\Exercices\Exercice 2\Resolution\R\B.csv]")
```
```{R, eval = T}
#| code-fold: true
a
b
```
- Go to [https://github.com/gpierard/CatalystRPA_trainings_public](https://github.com/gpierard/CatalystRPA_trainings_public) and download the files `a.csv` and `b.csv`.
- Use Google to find a way to read the files in R or python, as you prefer.
- Combine these datasets together by adding the rows of the second file after the first one.
::: {.callout-tip}
[stackoverflow.com](stackoverflow.com) and chatGPT can be very helpful.
:::
Resolution (try yourself first)
In R:
```{R, eval = F}
#| code-fold: true
starttime <- Sys.time()
a <- read.csv(r"[C:\CatalystRPA_automatisation_HoT-main\Exercices\Exercice 2\Resolution\R\A.csv]")
b <- read.csv(r"[C:\CatalystRPA_automatisation_HoT-main\Exercices\Exercice 2\Resolution\R\B.csv]")
result <- rbind(a,b) # concatenating data from both files
result$double_age <- result$Age*2
endtime <- Sys.time()
endtime-starttime
print(result)
```
In Python:
```{python, eval = F}
#| code-fold: true
import pandas as pd
starttime = time.time()
# Read CSV files
a <- read.csv(r"[C:\CatalystRPA_automatisation_HoT-main\Exercices\Exercice 2\Resolution\R\A.csv]")
b <- read.csv(r"[C:\CatalystRPA_automatisation_HoT-main\Exercices\Exercice 2\Resolution\R\B.csv]")
# Concatenate dataframes
result = pd.concat([a, b], ignore_index=True)
# Create a new column 'double_age' by doubling the 'Age' column
result['double_age'] = result['Age'] * 2
endtime = time.time()
print(result)
```
### Exercise 2: Under the hood
This second example is a bit more complicated.
- Go to [https://github.com/gpierard/CatalystRPA_trainings_public](https://github.com/gpierard/CatalystRPA_trainings_public) and download the file `cars.xlsx`
- Use Google to find a way to read this file in R or python, as you prefer.
- Part 1: Calculate the mean price of vehicles with at least 6 cylinders.
- Part 2: Calculate the mean price of vehicles with at least 6 cylinders and over 200HP.
Again, [stackoverflow.com](stackoverflow.com) is a useful resource.
#### Resolution (in R) - try by yourself first:
```{r , eval = T}
#| code-fold: true
library(openxlsx) # install.packages("openxlsx")
library(assertthat) # install.packages("assertthat")
df <- read.xlsx(r"[C:\CatalystRPA_automatisation_HoT-main\Exercices\Exercice 2\Resolution\R\cars.xlsx]")
df$numprice <- gsub(" Euro", "", df$Base.price, ignore.case = T)
df$numprice <- as.numeric(df$numprice)
# assert_that(is.numeric(df$numprice), msg="error, the new price is not numeric")
# plot(df$numprice[order(df$numprice)])
# part 1 ------------------------------------------------------------------
myfilter <- df$Number.of.cylinders>=6
df2 <- df[myfilter,]
# nrow(df);ncol(df); dim(df2)
# class(df2)
meanprice1 <- mean(df2$numprice)
# print(meanprice1)
# part 2 ------------------------------------------------------------------
myfilter <- df$Number.of.cylinders>=6 & df$`Power.(HP)`>=200
meanprice2 <- mean(df$numprice[myfilter])
# print(meanprice2)
barplot(c(mean(df$numprice), meanprice1, meanprice2), main="Avg Price (EUR)", names.arg = c("All vehicles", ">=6cyl", ">=6cyl/200HP"))
```
#### Resolution in python
We can indeed perform the same kind of operations with pandas in python.
```
pip install pandas
```
```{python , eval = T}
#| code-fold: true
import pandas as pd
# Read the Excel file
df = pd.read_excel(r"C:\CatalystRPA_automatisation_HoT-main\Exercices\Exercice 2\Resolution\R\cars.xlsx")
# Remove " Euro" from the 'Base.price' column and convert to numeric
df['numprice'] = df['Base price'].str.replace(' Euro', '').astype(float)
# Part 1
myfilter = df['Number of cylinders'] >= 6 # vectorized operations, as in R.
df2 = df[myfilter]
# Calculate the mean price
meanprice1 = df2['numprice'].mean()
# print(meanprice1)
# remainder is left for you to finish
```
As you can see, the pandas syntax in python is similar, and offers the same kind of capabilities than R, including vectorization.
# Browser automation in Python with Selenium
Now that we have some basic experience with R and Python, let's dive into browser automation with a Selenium example and see how it actually works.
::: {.callout-tip}
Selenium is best used with python.
:::
Head to [Selenium - Getting Started](https://www.selenium.dev/documentation/webdriver/getting_started/) and install the python library.
```pip install selenium```
### Example
```{python, eval = F}
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome() # start the chrome driver
driver.implicitly_wait(1) # set an implicit wait time of 1 second
driver.maximize_window()
driver.get("https://www.google.com/?gl=us&hl=en&gws_rd=cr&pws=0")
```
In order to interact with the different elements, we can use Chrome DevTools
> Chrome DevTools is a set of web developer tools built directly into the Google Chrome browser. DevTools can help you edit pages on-the-fly and diagnose problems quickly, which ultimately helps you build better websites, faster.
Either right click on the element you need to interact with and select `inspect`, or use the `Ctrl+Shift+C` shortcut. Let's try that on the google search bar.
![](images/google1.png)
Let's zoom in on the elements.
![](images/google2.png)
We see that an html tag of the search bar is `textarea`, and it has a `title` attribute with value `Search`. Let's use that information back in our Selenium program.
```{python, eval = F}
text_areas = driver.find_elements(By.TAG_NAME, "textarea")
len(text_areas) # 2
```
You can use a few methods to locate elements using `find_elements()` or `find_element()`. If the driver isn't able to find any element matching your criteria after the (implicit or explicit) timeout defined, `find_elements()` will return an empty list, whereas `find_element()` will throw a timeout error.
Here, we've found 2 elements with tag `textarea`. Let's get the title attribute for those using python list comprehension syntax.
From the [python doc](https://docs.python.org/2/tutorial/datastructures.html)
> List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.
```{python, eval = F}
titles = [x.get_attribute("title") for x in text_areas]
titles.index("Search") # 0
```
The results may differ for you, but I get that `titles` equals `['Search', '']`, meaning that only the first element in my list has the `Search` title. We can extract the corresponding index of this element using the `index` method of the list.
```{python, eval = F}
searchbar = text_areas[titles.index("Search")]
searchbar.clear() # clear any previously entered text
searchbar.send_keys("CatalystRPA")
searchbar.send_keys(Keys.ENTER)
```
We're now at the results page.
```{python, eval = F}
driver.current_url #'https://www.google.com/search?q=CatalystRPA&sca_esv=9 ...
```
Let's retrieve the first result and click it. Like before, we can inspect the result link.
![](images/gres.png)
Here, it's slightly more complicated as we don't have meaningful class names, which means that these names can potentially change for each run of the robot. We do however have a parent `div` with `id="search"`
```{python, eval = F}
# retrieve the parent searchresult element
searchresults = driver.find_elements(By.XPATH, "//div[@id='search']")
len(searchresults) # 1
searchres = searchresults[0]
```
Now that we've identified the parent element, we can narrow down our search. let's examine the `href` attributes of `a` tags contained in the search results. We can see that the `href` of the first element contains the link we're after.
```{python, eval = F}
# check the first link of the results
allresults = searchres.find_elements(By.TAG_NAME, "a")
len(allresults) # 37
allresults[0].get_attribute('href') # 'https://www.catalystrpa.com/' this is the correct element, first search result
# however, this element is not clickable
```
Since that `a` element is not clickable, we can retrieve the first `h3` tag contained in `searchres`, and click it (of course we could have used the `href` URL directly with `driver.get(<url>`). There are often many ways to perform the same action, some more elegant than others. This is just an illustrative example.
```{python, eval = F}
all_links = searchres.find_elements(By.TAG_NAME, "h3")
len(all_links) # 12
allresults = searchres.find_elements(By.TAG_NAME, "a")
links = [x.get_attribute('href') for x in allresults]
all_links[links.index('https://www.catalystrpa.com/')].click() # OK
driver.current_url # 'https://www.catalystrpa.com/ OK'
# we could continue...
```
### Exercise: Selenium
Click below to unfold the full script from the previous example, and use it as an example to create a similar python script in RStudio (or another IDE) using Selenium to find the website of your current organization on google and visit it.
```{python, eval = F}
#| code-fold: true
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome() # start the chrome driver
driver.implicitly_wait(1) # set an implicit wait time of 1 second
driver.maximize_window()
driver.get("https://www.google.com/?gl=us&hl=en&gws_rd=cr&pws=0") # US website
text_areas = driver.find_elements(By.TAG_NAME, "textarea")
len(text_areas)
titles = [x.get_attribute("title") for x in text_areas]
searchbar = text_areas[titles.index("Search")]
searchbar.clear() # clear any previously entered text
searchbar.send_keys("CatalystRPA")
searchbar.send_keys(Keys.ENTER)
driver.current_url
# there are many div elements on the page
div_elements = driver.find_elements(By.TAG_NAME, "div")
len(div_elements) # 1251
# retrieve the parent searchresult element
searchresults = driver.find_elements(By.XPATH, "//div[@id='search']")
len(searchresults) # 1
searchres = searchresults[0]
# check the first link of the results
allresults = searchres.find_elements(By.TAG_NAME, "a")
len(allresults) # 37
allresults[0].get_attribute('href') # 'https://www.catalystrpa.com/' this is the correct element, first search result
# however, this element is not clickable
all_links = searchres.find_elements(By.TAG_NAME, "h3")
len(all_links) # 12
all_links[0].click() # OK
driver.current_url # 'https://www.catalystrpa.com/'
# we could continue...
```
## Combining tools
There are several ways to combine UIPath robots, R and python scripts.
For example, we can use `system` in R to send system commands, or we can use the terminal directly.
```
system("python myscript.py")
```
To launch an R script from python, use
```
subprocess.call ("/pathto/MyrScript.r")
```
To execute an UIPath sequence, we can use the UIRobot utility.
```
"C:\Users\{UserName}\AppData\Local\UiPath\app-20.4.1-beta0022\UiRobot.exe" "C:\UiPath\Automation\Main.xaml"
```
# UIPath
## Introduction
UIPath is an automation tool to handle and manipulate the User Interface on Windows (RPA), but not only...
## Concepts
The idea of such tools is to be able to mimic the visual user experience and so to reproduce it several times. There is no need to code lines, even if UIPath allows you to provide lines of VB or C# code to be executed in specific endpoints...
RPA projects in UIPath can be of several types. The main and most used are:
- Robot implementing the automation of a process
- Library allowing to describe specific repetitive elements as components to be reused inside Robots
## Project
A UIPath project is composed of one or more xaml files. A process must contain at least one xaml file called "Main.xaml".
Xaml files contain and describe specific elements which compose a complete process.
In this file type, you can describe:
- Sequence/WorkFlow
- Flowchart
- StateMachine
- GlobalHandler
### Sequence/WorkFlow
A Sequence xaml file allows to chain tasks called "activities" in a sequential manner.
![](images/Sequence.png)
### FlowChart
Flowchart allow a representation of sequence of activities asx flow diagrams. It is mostly used to represent complex robots in the chaining of processes (sequences);
![](images/Flowchart.png)
### State Machine
State Machines are mostly used to react on state change of elements. begining of state change and end of state change.
### GlobalHandler
GlobalHandler are used to define, describe and implement error handling in UIPath projects.
You will define the error type you want to handle and the way to handle it.
![](images/GlobalHandler.png)
### Activity
Any manipulation in UIPath is defined as an activity (click on a button, typing on a field, storing vale in a variable, creating a file, etc...). You combine them in sequences to have them executed in a specific order.
## Visual Interface
The UIPath UI is defined in 4 main areas:
- Vertical left panel => Projet/Activity/Snipets
- Vertical right panel => Object Repo/Properties/Outline/Resources/Activity Coverage
- Horizontal lower panel => Output/Error List/Find References/Breakpoints
- Central panel => main working panel where you define you processes in chained activites
![](images/UIPathUI.png)
## Examples
### Example 1: Getting Exchange rate
The example will automate querying XRates from a website.
(@) Looping through all (XRates) rows of an input file "othercurrencies.csv"
```
other_currencies
JPY
USD
CAD
```
(@) go and get the xrate value against GBP
(@) Store all XRates collected in a file "uiresult.csv"
The resulting file is
```
other_currencies;fxrate
JPY;0,00521
USD;0,80239
CAD;0,58303
```
```
UIPath project
"C:\CatalystRPA_automatisation_HoT-main\Exercices\Exemple 1"
```
### Exercise 2: Combining two files
As already soled in R and Python, the goal is to merge 2 files and double each value...
* Read A.csv
```
Name,Age
Jack,22
George,30
David,54
Eric,27
Alex,45
```
* Read B.csv
```
Name,Age
Michael,16
Robert,80
Janet,75
Diane,31
Gregory,20
```
* Merge A & B
* Loop through all rows of the merged files and double the age in an additional columns called "double_age"
* Write result in "uipath_result.csv"
```
Name,Age,double_age
Jack,22,44
George,30,60
David,54,108
Eric,27,54
Alex,45,90
Michael,16,32
Robert,80,160
Janet,75,150
Diane,31,62
Gregory,20,40
```
```
UIPath project
"C:\CatalystRPA_automatisation_HoT-main\Exercices\Exercice 2\Resolution\UIPath"
```