python-scraping-notes

发表于 2018-01-02 | 更新于: 2018-01-03 | 分类于 programming ， python | | 次

my tips about web scraping using python

good to read

useful tools

chrome plugin SelectorGadget, get the xpath easy
useful python modueles

tips

file IO

Create a new file

def write_file(path, data,mode='a'):
    with open(path, mode,encoding='utf8') as f:
        f.write(data)

### Delete the contents of a file
def delete_file_contents(path):
    open(path, 'w').close()

get elements

1 2	import lxml.html dom = lxml.html.fromstring(src)

def get_info_xpath( dom, xpath):
    result=""
    for div in dom.xpath(xpath):

        try:
            result = result +div.text +"\n"
            # print(div.text)
        except:

            result = result + div.text_content() + "\n"
            # print(div.text_content())
    return result

useful code block

manipulate dir and file Path

navigate with css selector,and conditons

1
2
3

for link in soup.select('div a'):
    if(link.find_all("span",class_="cccc")):
        print(link.get('href'))

google translator free API

from googletrans import Translator
trans = Translator(service_urls=[
      'translate.google.com',
      'translate.google.co.jp',
    ])
result = trans.translate(text, dest='zh-CN')
result.text # final text result

-

本文标题:python-scraping-notes

文章作者:caili-zhang

发布时间:2018年01月02日 - 10:01

最后更新:2018年01月03日 - 08:01

原始链接:https://caili-zhang.github.io/2018/01/02/python-scraping-notes/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际转载请保留原文链接及作者。

据说喜欢打赏的人运气都不会太差

caili-zhang

好记性不如烂键盘，不懂就先敲出来

GitHub Twitter FB Page

1. good to read
2. useful tools
3. useful python modueles
4. tips
5. file IO
1. 5.1. Create a new file
6. get elements
7. useful code block
8. navigate with css selector,and conditons
9. google translator free API

0%