python处理数据（一）

mac2022-06-30 26

CSV数据处理

csv文件格式

逗号分隔符（csv），有时也称为字符分隔值，因为分隔字符也可以不是逗号，其文件以纯文本的形式存储表格数据（数字和文本）。纯文本意味着该文件是一个字符序列，不含必须像二进制数字那样被解读的数据。csv文件由任意数目的记录组成，记录间以某种换行符分割；每条记录由字段组成，字段间的分隔符是其他字符或字符串，最常见的是逗号或制表符。通常，所有记录都有完全相同的字段序列。

csv数据格式

27,20,14,15,14,12,94,64,37,1015,1013,1009,7,5,2,21,8,35,0.00,152

另外，csv文件可以直接用excel或者类似软件打开，样子都是我们常见的表格形式。

常用读取数据方法

import codecs lineText = list() with codecs.open("test.csv",encoding="utf-8") as f: for line in f.readlines(): print (line.split(",")) #以列表形式，打印每一行的数据。 lineText.append(line.split(",")) print (lineText) #把上面所有行作为元素数据，存入一个列表中。

处理csv格式数据

import codecs import csv fileName = "test.csv" with codecs.open(fileName) as fcsv: linecsv = csv.reader(fcsv) rows = [row for row in linecsv] print (rows)

excel数据处理

python提供有第三方库来支持excel的操作，python处理excel文件用的第三方模块库，有xlrd、xlwt、xluntils和pyExcelerator，除此之外，python处理excel还可以用win32com和openpyxl模块.我们主要用xlrd、xlwt、xluntils这三个模块，pyExcelerator模块偶尔也会用。

xlrd 只能进行读取excel文件，没法进行写入文件; xlwt 可以写入文件，但是不能在已有的excel的文件上进行修改; xluntils 可以在已有的excel文件上进行修改; pyExcelerator 与xlwt类似，也可以用来生成excel文件

按行读取表数据

import xlrd def readExcel(): data = xlrd.open_workbook('test.xlsx') table = data.sheets()[0] # 打开第一张表 nrows = table.nrows # 获取表的行数 for i in range(nrows): # 循环逐行打印 print(table.row_values(i)) #通过row_values来获取每行的值 if __name__ == '__main__': readExcel()

按列读取表数据

import xlrd data = xlrd.open_workbook("whsc.xlsx") table2 = data.sheet_by_name("域名") #sheet标签页的名称 for col in range(table2.ncols): print (table2.col_values(col))

创建excel文件并写入内容

import xlwt excel = xlwt.Workbook() #创建3个表 sheet1 = excel.add_sheet("sheet1") sheet2 = excel.add_sheet("sheet2") sheet3 = excel.add_sheet("sheet3") #只在第一个表sheet1里写数据，如下： sheet1.write(0,0,"hello world1", cell_overwrite_ok=True) sheet1.write(1,0,"hello world2", cell_overwrite_ok=True) sheet1.write(2,0,"hello world3", cell_overwrite_ok=True) #第一个是行，第二个是列，第三个是内容，第二个参数用来确认同一个cell单元是否可以重设值。 excel.save("hello.xlsx") print("创建hello.xlsx完成")

使用样式、字体等效果

import xlwt excel = xlwt.Workbook() #创建3个表 sheet1 = excel.add_sheet("sheet1") sheet2 = excel.add_sheet("sheet2") sheet3 = excel.add_sheet("sheet3") #初始化样式 style = xlwt.XFStyle() #为样式创建字体 font = xlwt.Font() font.name = 'Times New Roman' #指定字体名称 font.bold = True #是否加粗 #设置样式的字体 style.font = font #使用样式 sheet3.write(0,1,'some bold Times text',style) #保存该excel文件,有同名文件时直接覆盖 excel.save('hello.xlsx') print('创建hello.xlsx文件完成!')

文件转换成pdf格式

在工作中，会遇到把html文件转换成pdf文件，转换成pdf有三种方法。python给我们提供了pdfkit这个模块，直接安装使用就可以了。

安装该模块

pip install pdfkit

简单例子

import pdfkit pdfkit.from_file("hello.html", 1.pdf) # 网页转换成pdf（直接把url转换成pdf文件） pdfkit.from_url("www.baidu.com", 2.pdf) # Html转换成pdf pdfkit.from_string("hello world", 3.pdf) # 字符串转换成pdf

抓取apelearn上的教程，并抓换成pdf

import os import re import pdfkit import requests if not os.path.exists("aminglinux"): os.mkdir("aminglinux") # 创建一个目录来存放生成的pdf文件 os.chdir("aminglinux") # 切换到创建好的目录 url = "http://www.apelearn.com/study_v2/" s = requests.session() text = s.get(url).text reg = re.compile(r'<li class=\"toctree-l1\"><a class=\"reference internal\" href=\"(.*)\">.*<\/a><\/li>') result = reg.findall(text) res = list(set(result)) for i in res: purl = "{0}{1}".format(url, i) print (purl) pdfFileName = i.replace("html", "pdf") print (pdfFileName) config = pdfkit.configuration(wkhtmltopdf=r"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe") try: pdfkit.from_url(purl, pdfFileName, configuration=config) except: continue

结果：

chapter1.pdf chapter2.pdf chapter3.pdf chapter4.pdf chapter5.pdf ...... ......

注意：如果使用的是windows需要安装一个wkhtmltopdf驱动，否则会报错。

转载于:https://www.cnblogs.com/yangjian319/p/9157977.html

最新回复(0)