Skip to content

Commit

Permalink
update structure
Browse files Browse the repository at this point in the history
  • Loading branch information
NeverBehave committed Jul 19, 2022
1 parent e7914f5 commit e9b824d
Show file tree
Hide file tree
Showing 22 changed files with 445 additions and 7,250 deletions.
36 changes: 36 additions & 0 deletions .github/workflows/docker-telegram.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: docker-telegram-bot

on:
push:
branches:
- 'master'

jobs:
docker:
runs-on: ubuntu-latest
defaults:
run:
working-directory: telegram/
steps:
-
name: Set up QEMU
uses: docker/setup-qemu-action@v2
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
-
name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- uses: actions/checkout@v2
-
name: Copy Resource File
run: cp ../xi.json .
-
name: Build and push
uses: docker/build-push-action@v3
with:
push: true
tags: neverbehave/xixi-haha-bot:latest
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1 @@
.DS_Store
.DS_Store
25 changes: 0 additions & 25 deletions .gitlab-ci.yml

This file was deleted.

76 changes: 3 additions & 73 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,87 +20,17 @@ http://jhsjk.people.cn

嗯。

API上线了(使用方法可参考[这里](https://blog.lwl12.com/read/hitokoto-api.html)):https://fission.never.eu.org/yiyan/xixi-haha

Telegram Bot: [@xixi_haha_bot](https://t.me/xixi_haha_bot)

支持`inline mode`,托管在AWS Lambda

## 还有什么说的吗?

这个版本没有太注重分句,主要是清理了一下内容。抓取脚本(XJB)参考在这里:

```python3
import json
from bs4 import BeautifulSoup
import requests


li = []
sentences = []

# Get ALL Articles
for i in range(1, 16):
print(i)
result = requests.get(
"http://jhsjk.people.cn/result/{}?form=706&else=501".format(i))
soup = BeautifulSoup(result.content)
arr = soup.find_all('ul', class_="p1_2")[0].find_all('a')
for a in arr:
li.append('http://jhsjk.people.cn/' + a['href'])

print('start fetching pages')
for addr in li:
print(addr)
result = requests.get(addr)
soup = BeautifulSoup(result.content)
arr = soup.find_all(class_="d2txt_con")[0].find_all('p')

for i in arr:
sentences.append(i.text.strip())


with open('xi.json', 'w') as outfile:
json.dump(sentences, outfile, indent=2, ensure_ascii=False)

```

清理:
```python3
import json

with open('xi-v1.json') as f:
data = json.load(f)

sentences = []

# extend data
# \r\r\n is replaced by hand
# ((新)(.*)) for attr
# (新华社记者)(.*摄)
# for i in data:
# li = i.split('\n')
# for s in li:
# s = s.strip()
# if s is not "":
# if s.startswith('(新') is False and s.startswith('(') is False and s.startswith('(2') is False and s.startswith('《 ') is False and s.startswith('>') is False:
# sentences.append(s)

for s in data:
if s.startswith('(人') is False and s.startswith('(') is False and s.startswith('(2') is False and s.startswith('') is False and s.startswith('(2') is False:
sentences.append(s)
else:
print(s)

# a = list(set(sentences))

with open('xi-v2.json', 'w') as outfile:
json.dump(sentences, outfile, indent=2, ensure_ascii=False)
```
这个版本没有太注重分句,主要是清理了一下内容。抓取脚本(XJB)参考在`parse/`

服务端
## 服务端

`index.js``Dockerfile`
`web/`

## 没了?

Expand Down
1 change: 0 additions & 1 deletion lambda/.env.example

This file was deleted.

7,084 changes: 0 additions & 7,084 deletions lambda/xi.json

This file was deleted.

65 changes: 0 additions & 65 deletions lambda/yarn.lock

This file was deleted.

29 changes: 29 additions & 0 deletions parse/cleanup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import json

with open('xi-v1.json') as f:
data = json.load(f)

sentences = []

# extend data
# \r\r\n is replaced by hand
# ((新)(.*)) for attr
# (新华社记者)(.*摄)
# for i in data:
# li = i.split('\n')
# for s in li:
# s = s.strip()
# if s is not "":
# if s.startswith('(新') is False and s.startswith('(') is False and s.startswith('(2') is False and s.startswith('《 ') is False and s.startswith('>') is False:
# sentences.append(s)

for s in data:
if s.startswith('(人') is False and s.startswith('(') is False and s.startswith('(2') is False and s.startswith('《 ') is False and s.startswith('(2') is False:
sentences.append(s)
else:
print(s)

# a = list(set(sentences))

with open('xi-v2.json', 'w') as outfile:
json.dump(sentences, outfile, indent=2, ensure_ascii=False)
31 changes: 31 additions & 0 deletions parse/download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import json
from bs4 import BeautifulSoup
import requests


li = []
sentences = []

# Get ALL Articles
for i in range(1, 16):
print(i)
result = requests.get(
"http://jhsjk.people.cn/result/{}?form=706&else=501".format(i))
soup = BeautifulSoup(result.content)
arr = soup.find_all('ul', class_="p1_2")[0].find_all('a')
for a in arr:
li.append('http://jhsjk.people.cn/' + a['href'])

print('start fetching pages')
for addr in li:
print(addr)
result = requests.get(addr)
soup = BeautifulSoup(result.content)
arr = soup.find_all(class_="d2txt_con")[0].find_all('p')

for i in arr:
sentences.append(i.text.strip())


with open('xi.json', 'w') as outfile:
json.dump(sentences, outfile, indent=2, ensure_ascii=False)
3 changes: 3 additions & 0 deletions telegram/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
BOT_TOKEN=
WEBHOOK_PATH=/secret-path
WEBHOOK_PORT=5000
File renamed without changes.
21 changes: 21 additions & 0 deletions telegram/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM node:lts-alpine3.16

# Create app directory
WORKDIR /usr/src/app

ENV BOT_TOKEN=
ENV WEBHOOK_PATH /secret-path
ENV WEBHOOK_PORT 5000

# Install app dependencies
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm@5+)
COPY package*.json ./

RUN npm install

COPY . .

EXPOSE 5000

CMD [ "npm", "run", "start:webhook" ]
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit e9b824d

Please sign in to comment.