人工智能 - 跟李沐学AI

跟李沐学AI

B站链接:https://space.bilibili.com/1567748478/channel/seriesdetail?sid=358497

课程官网:https://courses.d2l.ai/zh-v2/

环境配置

在一台新的Ubuntu机器上:

首先更新软件包:sudo apt update

安装gcc之类的东西:sudo apt install build-essential

安装Python:sudo apt install python3.8

安装Miniconda:

先进入miniconda的官方文档:https://docs.conda.io/en/latest/miniconda.html
找到#linux-installers
在里面选中python3.8,复制链接地址
在服务器中将其下载下来:wget 刚刚复制的地址 (例如https://repo.anaconda.com/miniconda/Miniconda3-py38_23.1.0-1-Linux-x86_64.sh
直接bash 刚刚下载下来的.sh文件(例如bash Miniconda3-py38_23.1.0-1-Linux-x86_64.sh
再运行一下bash命令就进入conda环境了

安装所需要的Python包:pip install jupyter d2l torch torchvision(torchvision是pytorch的一个图形库)

下载d2l官网jupyter记事本wget https://zh-v2.d2l.ai/d2l-zh.zip

安装解压用的zip:sudo apt install zip

解压刚刚的zip:unzip d2l-zh.zip

解压出来有三个文件夹(mxnet版本、pytorch版本、transformer版本)

本课程主要使用Pytorch版本。此外,本课程还将使用幻灯片版本的“记事本”:git clone https://github.com/d2l-ai/d2l-zh-pytorch-slides 并进入:cd .\d2l-zh-pytorch-slides\

打开jupyter:jupyter notebook。这样将会在机器上开辟一个8888端口。

如果是在服务器上进行的上述操作,也可以将远端的端口映射到本地ssh -L8888:localhost:8888 [email protected]

可以安装一个插件,pip install rise来以幻灯片格式显示。

Pytorch基础

1
import torch

张量(数组)的创建与基本操作

从0到11的数组:

1
2
x = torch.arange(12)
print(x)

运行结果:

1
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

使用列表初始化数组:

1
torch.tensor([[1, 2], [3, 1]])

运行结果:

1
2
tensor([[1, 2],
[3, 1]])

数组形状更改reshape:

注意x自身并不会发生改变,这个函数只是返回一个改变后的副本

1
2
x2 = x.reshape(3, 4)
print(x2)

运行结果:

1
2
3
tensor([[ 0,  1,  2,  3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

注意虽然b和a不同,但修改b中的元素可能会导致a中元素的改变(可以理解为b是a的另一个视图)

1
2
3
4
5
a = torch.arange(12)
b = a.reshape(3, 4)
print(id(a) == id(b))
b[:] = 2
print(a)

运行结果:

1
2
False
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

获取数组形状shape:

注意.shape是一个“成员”但不是一个“方法”

1
print(x2.shape)

运行结果:

1
torch.Size([3, 4])

获取数组中元素总个数:

1
print(x2.numel())

运行结果:

1
12

生成全是1的数组:

1
2
x = torch.ones(2, 3)
print(x)

运行结果:

1
2
tensor([[1., 1., 1.],
[1., 1., 1.]])

指定数据类型:

1
2
x = torch.ones(2, 3, dtype=int)
print(x)

运行结果:

1
2
tensor([[1, 1, 1],
[1, 1, 1]])

张量间的+-乘除等运算:

1
2
3
4
5
6
7
8
9
x = torch.tensor([1., 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
print(x + y)
print(x - y)
print(x * y)
print(x / y)
print(x ** y)
print(torch.exp(x)) # e ^ 1, e ^ 2, e ^ 3, e ^ 4
print(x == y)

运行结果:

1
2
3
4
5
6
7
tensor([ 3.,  4.,  6., 10.])
tensor([-1., 0., 2., 6.])
tensor([ 2., 4., 8., 16.])
tensor([0.5000, 1.0000, 2.0000, 4.0000])
tensor([ 1., 4., 16., 64.])
tensor([2.7183e+00, 7.3891e+00, 5.4598e+01, 2.9810e+03])
tensor([False, True, False, False])

向量连接(concatenate):torch.cat

1
2
3
X = torch.arange(12, dtype=torch.float32).reshape((3, 4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)

运行结果:

1
2
3
4
5
6
7
8
9
(tensor([[ 0.,  1.,  2.,  3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 2., 1., 4., 3.],
[ 1., 2., 3., 4.],
[ 4., 3., 2., 1.]]),
tensor([[ 0., 1., 2., 3., 2., 1., 4., 3.],
[ 4., 5., 6., 7., 1., 2., 3., 4.],
[ 8., 9., 10., 11., 4., 3., 2., 1.]]))

默认dim = 0

1
2
3
x = torch.tensor([[1, 2], [3, 4]])
y = torch.tensor([[5, 6]])
print(torch.cat((x, y)))

运行结果:

1
2
3
tensor([[1, 2],
[3, 4],
[5, 6]])

只有拼接的那一维度的长度可以不同,其他维度必须相同(By Let,未完全验证)。例如下面代码会报错:

1
2
3
x = torch.tensor([[1, 2], [3, 4]])
y = torch.tensor([[5, 6]])
torch.cat((x, y), dim=1)

运行结果:

1
2
3
4
5
6
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[15], line 3
1 x = torch.tensor([[1, 2], [3, 4]])
2 y = torch.tensor([[5, 6]])
----> 3 torch.cat((x, y), dim=1)

求和:x.sum()

产生一个只有一个元素的张量:

1
2
print(x)
print(x.sum())

运行结果:

1
2
3
tensor([[1, 2],
[3, 4]])
tensor(10)

广播机制:形状不同的向量进行运算

1
2
3
4
5
6
7
8
9
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
print(a)
print(b)
print(a + b)
print(a - b)
print(a * b)
print(a / b)
print(a == b)

相当于是把a复制成了3 x 2,把b也复制成了3 x 2

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
tensor([[0],
[1],
[2]])
tensor([[0, 1]])
tensor([[0, 1],
[1, 2],
[2, 3]])
tensor([[ 0, -1],
[ 1, 0],
[ 2, 1]])
tensor([[0, 0],
[0, 1],
[0, 2]])
tensor([[nan, 0.],
[inf, 1.],
[inf, 2.]])
tensor([[ True, False],
[False, True],
[False, False]])

同理

取元素/改元素:[第一维列表操作, 第二维列表操作, 第三维]

1
2
3
4
5
6
7
8
x = torch.arange(12).reshape(3, 4)
print(x)
print(x[0:2, 1:3])
x[:, -1] = 0
print(x)
x[0] = -1
print(x)
print(x[1, 2] == x[1][2])

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
tensor([[ 0,  1,  2,  3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
tensor([[1, 2],
[5, 6]])
tensor([[ 0, 1, 2, 0],
[ 4, 5, 6, 0],
[ 8, 9, 10, 0]])
tensor([[-1, -1, -1, -1],
[ 4, 5, 6, 0],
[ 8, 9, 10, 0]])
tensor(True)

一些操作可能导致为结果重新分配内存:

1
2
3
4
5
6
before = id(Y)
print(before)
Y = X + Y
after = id(Y)
print(after)
print(before == after)

运行结果:

1
2
3
139769251739696
139769252745984
False

那是当然的,X + Y肯定要新赋值给一个元素,不能把X或Y的值给修改掉。

原地执行操作:

1
2
3
4
before = id(Y)
Y += X
after = id(Y)
print(before == after)

运行结果:

1
True

原地执行:

1
2
3
4
5
Z = torch.zeros_like(Y)
before = id(Z)
Z[:] = X + Y
after = id(Z)
print(before == after)

运行结果:

1
True

转为Numpy张量

1
2
A = x.numpy()
print(type(A), type(x))

运行结果:

1
<class 'numpy.ndarray'> <class 'torch.Tensor'>

将大小为1的张量转为Python的标量:

1
2
3
4
x = torch.tensor([1])
print(x, x.item(), float(x), int(x))
y = torch.tensor([1.])
print(y, y.item(), float(y), int(y))

运行结果:

1
2
tensor([1]) 1 1.0 1
tensor([1.]) 1.0 1.0 1

数据预处理

新建一个数据集

1
2
3
4
5
6
7
dataFile = "data.csv"
with open(dataFile, 'w') as f:
f.write('NumRooms,Alley,Price\n')
f.write('NA,Pave,127500\n')
f.write('2,NA,106000\n')
f.write('4,NA,178100\n')
f.write('NA,NA,140000\n')

读取到pandas中

1
2
3
4
import pandas as pd  # pip install pandas
data = pd.read_csv(dataFile)
print(data)
data # 在jupyter中直接调用data输出效果会更好

运行结果:

1
2
3
4
5
   NumRooms Alley   Price
0 NaN Pave 127500
1 2.0 NaN 106000
2 4.0 NaN 178100
3 NaN NaN 140000

获取输入和输出

1
2
# pandas的数据需要.iloc之后才能向torch那样取值
inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]

处理缺失值

使用平均值填补NaN:

1
2
inputs = inputs.fillna(inputs.mean())
print(inputs)

运行结果:

1
2
3
4
5
6
7
   NumRooms Alley
0 3.0 Pave
1 2.0 NaN
2 4.0 NaN
3 3.0 NaN
/tmp/ipykernel_13420/2420151946.py:1: FutureWarning: The default value of numeric_only in DataFrame.mean is deprecated. In a future version, it will default to False. In addition, specifying 'numeric_only=None' is deprecated. Select only valid columns or specify the value of numeric_only to silence this warning.
inputs = inputs.fillna(inputs.mean())

警告的意思是说在未来的版本中,numeric_only将不设置默认值。因此手动添加numeric_only=True以消除警告:

1
inputs = inputs.fillna(inputs.mean(numeric_only=True))

将pandas中的Nan视为一个类别

1
2
inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs)

运行结果(结果中的1和0也有可能被标记为True和False):

1
2
3
4
5
   NumRooms  Alley_Pave  Alley_nan
0 3.0 1 0
1 2.0 0 1
2 4.0 0 1
3 3.0 0 1

将数据转为torch的张量

1
2
3
4
5
6
7
print(inputs.values)
import torch
X, y = torch.tensor(inputs.values), torch.tensor(outputs.values)
# 若上一步被标记为了True和False而非1和0,这一步应强制转一个float类型:
# X, y = torch.tensor(inputs.values.astype(float)), torch.tensor(outputs.values)
print(X)
print(y)

运行结果:

1
2
3
4
5
6
7
8
9
[[3. 1. 0.]
[2. 0. 1.]
[4. 0. 1.]
[3. 0. 1.]]
tensor([[3., 1., 0.],
[2., 0., 1.],
[4., 0., 1.],
[3., 0., 1.]], dtype=torch.float64)
tensor([127500, 106000, 178100, 140000])

注意这里X的dtype是64位浮点数。但其实64位运行较慢,实际使用时经常使用32位浮点数。

1
2
3
X = X.to(dtype=torch.float32)
print(X)
print(X.dtype)

运行结果:

1
2
3
X = X.to(dtype=torch.float32)
print(X)
print(X.dtype)

线性代数基础

矩阵转置x.T:

1
2
3
x = torch.arange(12).reshape(3, 4)
print(x)
print(x.T)

运行结果:

1
2
3
4
5
6
7
tensor([[ 0,  1,  2,  3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
tensor([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])

简单操作:

  • $c = a + b$ where $c_i=a_i+b_i$
  • $c=\alpha\cdot b$ where $c_i=\alpha b_i$
  • $c=\sin a$ where $c_i=\sin a_i$

长度:

  • $||a||2=[\sum^m{i=1}a_i^2]^{\frac12}$
  • $||a||\geq 0$ for all $a$
  • $||a + b||\leq ||a|| + ||b||$
  • $||a\cdot b||=|a|\cdot||b||$

自动求导

自动求导:requires_grad

1
2
3
4
5
6
7
8
x = torch.arange(4.)
x.requires_grad_(True) # 等价于 x = torch.arange(4., requires_grad=True)
print(x)
y = 2 * torch.dot(x, x) # y = 2x^2
print(y)
y.backward() # 反向求导
x.grad
print(x.grad == 4 * x) # y' = 4x

运行结果:

1
2
3
tensor([0., 1., 2., 3.], requires_grad=True)
tensor(28., grad_fn=<MulBackward0>)
tensor([True, True, True, True])

清除梯度:x.grad.zero_

默认情况torch会把梯度累积起来,因此计算下一个梯度是时候记得清除掉之前的梯度

1
2
3
4
5
6
7
8
9
10
11
x = torch.arange(4., requires_grad=True)
y = 2 * torch.dot(x, x)
y.backward()
print(x.grad)
y = torch.dot(x, x)
y.backward()
print(x.grad) # 两个grad的累加
x.grad.zero_()
y = torch.dot(x, x)
y.backward()
print(x.grad) # y=x^2的真正的grad

运行结果:

1
2
3
tensor([ 0.,  4.,  8., 12.])
tensor([ 0., 6., 12., 18.])
tensor([0., 2., 4., 6.])

将某些计算结果移动到记录的计算图之外:y.detach()

1
2
3
4
5
6
7
8
x = torch.arange(4., requires_grad=True)
y = x * x
print(y)
u = y.detach() # u视为一个对x的常数
print(u)
z = u * x
z.sum().backward()
print(x.grad == u)

运行结果:

1
2
3
tensor([0., 1., 4., 9.], grad_fn=<MulBackward0>)
tensor([0., 1., 4., 9.])
tensor([True, True, True, True])

但注意y.detach()不改变y,y仍是关于x的函数

1
2
3
x.grad.zero_()
y.sum().backward()
print(x.grad == 2 * x)

运行结果:

1
tensor([True, True, True, True])

线性回归

线性回归手动实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import random
import torch
from d2l import torch as d2l

def synthetic_data(w, b, num_examples):
"""生成y = Xw + b + 噪声"""
X = torch.normal(0, 1, (num_examples, len(w))) # 生成均值是0,标准差是1的正态分布。每个值的shape为(num_examples, len(w))
y = torch.matmul(X, w) + b
y += torch.normal(0, 0.01, y.shape)
return X, y.reshape((-1, 1))

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000) # features是1000x2的矩阵,labels是1000x1的矩阵

def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices)
for i in range(0, num_examples, batch_size):
batch_indices = torch.tensor(indices[i:min(i + batch_size, num_examples)]) # min(i + batch_size, num_examples):防止num_example不是batch_size整数倍
yield features[batch_indices], labels[batch_indices]

batch_size = 10
w = torch.normal(0, 0.01, size=(2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

def linreg(X, w, b):
"""线性回归模型"""
# print(X.shape, w.shape)
# print(torch.matmul(X, w).shape)
return torch.matmul(X, w) + b

def squared_loss(y_hat, y):
"""均方损失"""
# print(y_hat.shape)
# print(y.shape)
# print(((y_hat - y.reshape(y_hat.shape)) ** 2 / 2).shape)
return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2

def sgd(params, lr, batch_size):
"""小批量梯度下降算法"""
with torch.no_grad():
for param in params:
param -= lr * param.grad / batch_size
param.grad.zero_()
lr = 0.03
num_epochs = 3
net = linreg
loss = squared_loss

for epoch in range(num_epochs):
for X, y in data_iter(batch_size, features, labels):
l = loss(net(X, w, b), y)
l.sum().backward()
sgd([w, b], lr, batch_size)
with torch.no_grad():
train_l = loss(net(features, w, b), labels)
print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')

print(f'w的估计误差: {true_w - w.reshape(true_w.shape)}')
print(f'b的估计误差: {true_b - b}')

运行结果:

1
2
3
4
5
epoch 1, loss 0.038151
epoch 2, loss 0.000152
epoch 3, loss 0.000048
w的估计误差: tensor([-0.0003, -0.0008], grad_fn=<SubBackward0>)
b的估计误差: tensor([0.0008], grad_fn=<RsubBackward1>)

借助Pytorch实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import numpy as np
import torch
from torch.utils import data

def synthetic_data(w, b, num_examples):
"""生成y = Xw + b + 噪声"""
X = torch.normal(0, 1, (num_examples, len(w))) # 生成均值是0,标准差是1的正态分布。每个值的shape为(num_examples, len(w))
y = torch.matmul(X, w) + b
y += torch.normal(0, 0.01, y.shape)
return X, y.reshape((-1, 1))

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

def load_array(data_arrays, batch_size, is_train=True):
"""构造一个PyTorch数据迭代器"""
dataset = data.TensorDataset(*data_arrays)
return data.DataLoader(dataset, batch_size, shuffle=is_train)

batch_size = 10
data_iter = load_array((features, labels), batch_size)

from torch import nn
net = nn.Sequential(nn.Linear(2, 1))
net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)
loss = nn.MSELoss()
trainer = torch.optim.SGD(net.parameters(), lr=0.03)

num_epochs = 3
for epoch in range(num_epochs):
for X, y in data_iter:
l = loss(net(X), y)
trainer.zero_grad()
l.backward()
trainer.step()
l = loss(net(features), labels)
print(f'epoch {epoch + 1}, loss {l:f}')


w = net[0].weight.data
print('w的估计误差:', true_w - w.reshape(true_w.shape))
b = net[0].bias.data
print('b的估计误差:', true_b - b)

运行结果:

1
2
3
4
5
epoch 1, loss 0.000258
epoch 2, loss 0.000101
epoch 3, loss 0.000100
w的估计误差: tensor([-0.0003, 0.0005])
b的估计误差: tensor([0.0012])

1

运行结果:

1

1

运行结果:

1

1

运行结果:

1

TODO: 等完成地差不多了发布至CSDN

原创不易,转载请附上原文链接哦~
https://blog.letmefly.xyz/2023/03/15/Other-AI-LearnAIWithLiMu/


人工智能 - 跟李沐学AI
https://blog.letmefly.xyz/2023/03/15/Other-AI-LearnAIWithLiMu/
作者
Tisfy
发布于
2023年3月15日
许可协议