python-scraping-notes

my tips about web scraping using python

good to read

useful tools

  • chrome plugin SelectorGadget, get the xpath easy

    useful python modueles

tips

file IO

Create a new file

1
2
3
4
5
6
7
def write_file(path, data,mode='a'):
with open(path, mode,encoding='utf8') as f:
f.write(data)

### Delete the contents of a file
def delete_file_contents(path):
open(path, 'w').close()

get elements

1
2
import lxml.html
dom = lxml.html.fromstring(src)
1
2
3
4
5
6
7
8
9
10
11
12
def get_info_xpath( dom, xpath):
result=""
for div in dom.xpath(xpath):

try:
result = result +div.text +"\n"
# print(div.text)
except:

result = result + div.text_content() + "\n"
# print(div.text_content())
return result

useful code block

  • manipulate dir and file Path
1
2
3
for link in soup.select('div a'):
if(link.find_all("span",class_="cccc")):
print(link.get('href'))

google translator free API

1
2
3
4
5
6
7
from googletrans import Translator
trans = Translator(service_urls=[
'translate.google.com',
'translate.google.co.jp',
])
result = trans.translate(text, dest='zh-CN')
result.text # final text result

-

本文标题:python-scraping-notes

文章作者:caili-zhang

发布时间:2018年01月02日 - 10:01

最后更新:2018年01月03日 - 08:01

原始链接:https://caili-zhang.github.io/2018/01/02/python-scraping-notes/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

据说喜欢打赏的人运气都不会太差