原

Pytorch 气温预测

1723人阅读 2023/12/21 11:34 总访问：3233490 评论：0 收藏：0 手机

分类: pytorch

![](https://img.tnblog.net/arcimg/hb/21f086c80c5d4afda1bc1029dadd8f3a.png)
>#Pytorch 气温预测
[TOC]

## 准备数据集

tn2>下载`temps.csv`数据集。

```bash
# 下载包
!wget https://raw.githubusercontent.com/AiDaShi/learningpytorch/main/010_015%EF%BC%9A%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%AE%9E%E6%88%98%E5%88%86%E7%B1%BB%E4%B8%8E%E5%9B%9E%E5%BD%92%E4%BB%BB%E5%8A%A1/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%AE%9E%E6%88%98%E5%88%86%E7%B1%BB%E4%B8%8E%E5%9B%9E%E5%BD%92%E4%BB%BB%E5%8A%A1/temps.csv -O temps.csv
```

tn2>接下来我们要完成以下步骤：

![](https://img.tnblog.net/arcimg/hb/ff177d12fd8b43198e6e4fe554bde852.png)

## 开始编码

### 读取数据

tn2>导入相关依赖包。

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.optim as optim
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
```

tn2>读取的csv数据。

```python
features = pd.read_csv('temps.csv')

# 看数据长什么样子
features.head()
```

![](https://img.tnblog.net/arcimg/hb/6c7b8183ef8d48c4b3977adb4c0e86d8.png)

| 列名  | 描述  |
| ------------ | ------------ |
| year,moth,day,week|分别表示的具体的时间|
| temp_2|前天的最高温度值|
| temp_1|昨天的最高温度值|
| average|在历史中，每年这一天的平均最高温度值|
| actual|这就是我们的标签值了，当天的真实最高温度|
| friend|这一列可能是凑热闹的，你的朋友猜测的可能值，咱们不管它就好了|

```python
# 查看数据维度
print('数据维度',features.shape)
```

>数据维度 (348, 9)

tn2>我们一共有348条，有九条数据列。
接着我们来处理一下时间格式。

### 处理时间格式

```python
# 处理时间数据
import datetime

# 分别得到年月日
years = features["year"]
months = features["month"]
days = features["day"]

# datetime拼装与格式化
dates = [str(int(year)) + '-' + str(int(month))+'-'+str(int(day)) for year,month,day in zip(years,months,days)]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]

# 查看前5条数据
dates[:5]
```

>[datetime.datetime(2016, 1, 1, 0, 0),
 datetime.datetime(2016, 1, 2, 0, 0),
 datetime.datetime(2016, 1, 3, 0, 0),
 datetime.datetime(2016, 1, 4, 0, 0),
 datetime.datetime(2016, 1, 5, 0, 0)]

```python
# 准备画风
# 指定默认风格
# plt.style.use('ggplot')
plt.style.use('fivethirtyeight')

# 设置布局
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, figsize=(10, 10))
fig.autofmt_xdate(rotation=45)

# 标签值 第一个参数是x，第二个参数是y
ax1.plot(dates, features["actual"], c="blue")
ax1.set_title("Max Temp");ax1.set_ylabel("Temperature");ax1.set_xlabel("")

# 昨天
ax2.plot(dates, features["temp_1"], c="red")
ax2.set_title("Previous Max Temp");ax2.set_ylabel("Temperature");ax2.set_xlabel("")

# 今天
ax3.plot(dates, features["temp_2"], c="green")
ax3.set_title("Two Days Prior Max Temp");ax3.set_ylabel("Temperature");ax3.set_xlabel("Date")

# 朋友
ax4.plot(dates, features["friend"], c="pink")
ax4.set_title("Friend Estimate");ax3.set_ylabel("Temperature");ax3.set_xlabel("Date")

# 子图边缘与图表边缘之间的填充量为2
plt.tight_layout(pad=2)
```

![](https://img.tnblog.net/arcimg/hb/8990016cc0414c46af7aaf89326f49c5.png)

tn2>由于week列它是字符串，我们可以通过独热编码将它进行转换成我们想要的数值形式。
对我们需要字符的列进行额外的编码。

### 文本热编码处理

tn2>`pd.get_dummies`方法它可以把包含有限几种类别的列（例如文本或分类数据）转换为数值形式。

```python
# 独热编码
features = pd.get_dummies(features)
features.head(5)
```

![](https://img.tnblog.net/arcimg/hb/cfd8fa9617f6454ab9393069dd07df2c.png)

tn2>接下来我们先把正确的标签列提出来，然后在特征中去掉标签，方便以后训练使用。

```python
# 标签
labels = np.array(features["actual"])
# 在特征中去掉标签
features = features.drop("actual", axis=1)
# 名字单独保存一下，以备后患
feature_list = list(features.columns)

# 转换成合适的格式
features = np.array(features)
```

```python
# 查看维度
features.shape
```

>(348, 14)

### 正态分布

```python
from sklearn import preprocessing
input_features = preprocessing.StandardScaler().fit_transform(features)
```

| 相关代码  | 描述  |
| ------------ | ------------ |
| `scikit-learn` |库进行特征标准化的操作|
| `StandardScaler`| 是用于标准化特征的工具,它通过移除均值并缩放到单位方差来标准化特征。|
| `.fit_transform(features)`| 方法首先计算了features的均值和标准差，然后使用这些参数将数据转换成标准化的形式|

tn2>最后的结果按照标准正态分布（均值为0，标准差为1）转换后的版本

```python
# 查看其中一条数据
input_features[0]
```

>array([ 0.        , -1.5678393 , -1.65682171, -1.48452388, -1.49443549,
       -1.3470703 , -1.98891668,  2.44131112, -0.40482045, -0.40961596,
       -0.40482045, -0.40482045, -0.41913682, -0.40482045])

tn>使用正态分布的好处在于收敛的速度快一些，收敛的误差更小一点

### 设置层级

```python
# 转换torch支持的格式
x = torch.tensor(input_features, dtype=float)
y = torch.tensor(labels, dtype=float)

# 权重参数初始化
# 这里把14个特征转换成128个隐藏特征（输入层有14个节点，而隐藏层有128个节点。）
weights = torch.randn((14,128), dtype=float, requires_grad=True)
# 创建了一个长度为128的一维张量，代表第一个隐藏层的偏置。
biases = torch.randn(128,dtype = float, requires_grad=True)
# 表示从第一个隐藏层到输出层的权重。这里假设输出层只有一个节点（常见于二分类或单值回归问题）
weights2 = torch.randn((128,1), dtype=float, requires_grad=True)
# 创建了一个单元素张量，代表输出层的偏置。
biases2 = torch.randn(1,dtype = float, requires_grad=True)

# 学习率
learning_rate = 0.001
# 损失
losses = []
```

![](https://img.tnblog.net/arcimg/hb/70f7c3f98f4742ab8a280c420467dbb7.png)

### 训练模型

tn2>训练模型的步骤如下图所示：

![](https://img.tnblog.net/arcimg/hb/4d407427949542c6b4309f176c9bc2a7.png)

```python

for i in range(1000):
    # 前向传播（计算隐藏层）
    hidden = x.mm(weights) + biases
    # 加入激活函数
    hidden = torch.relu(hidden)
    # 预测结果
    output = hidden.mm(weights2) + biases2
    # 通计算损失
    loss = torch.mean((output - y)**2)
    losses.append(loss.data.numpy())
    # 打印损失值
    if i % 100 == 0:
        print('loss:',loss)
    # 反向传播
    loss.backward()
    # 更新参数
    weights.data.add_(-learning_rate * weights.grad.data)
    biases.data.add_(-learning_rate * biases.grad.data)
    weights2.data.add_(-learning_rate * weights2.grad.data)
    biases2.data.add_(-learning_rate * biases2.grad.data)
    # 更新参数
    weights.grad.data.zero_()
    biases.grad.data.zero_()
    weights2.grad.data.zero_()
    biases2.grad.data.zero_()
```

>loss: tensor(5887.5310, dtype=torch.float64, grad_fn=<MeanBackward0>)
loss: tensor(152.7882, dtype=torch.float64, grad_fn=<MeanBackward0>)
loss: tensor(147.1391, dtype=torch.float64, grad_fn=<MeanBackward0>)
loss: tensor(144.9651, dtype=torch.float64, grad_fn=<MeanBackward0>)
loss: tensor(143.6304, dtype=torch.float64, grad_fn=<MeanBackward0>)
loss: tensor(142.7190, dtype=torch.float64, grad_fn=<MeanBackward0>)
loss: tensor(142.0560, dtype=torch.float64, grad_fn=<MeanBackward0>)
loss: tensor(141.5524, dtype=torch.float64, grad_fn=<MeanBackward0>)
loss: tensor(141.1599, dtype=torch.float64, grad_fn=<MeanBackward0>)
loss: tensor(140.8559, dtype=torch.float64, grad_fn=<MeanBackward0>)

tn2>我们可以看到误差在慢慢变小。

### 更简单的构建网络模型

tn2>这里使用了 `Sigmoid` 函数的特点是它能够将任何实数值映射到 (0, 1) 区间内，这使得它非常适合用于将任意值转换为概率。

| 方法  | 描述  |
| ------------ | ------------ |
| ` torch.nn.Linear` | 是一个模块，用于创建一个线性变换。 |
|`torch.nn.MSELoss` |是一个损失函数，用于计算均方误差（Mean Squared Error, MSE）。它主要用于回归任务。|
|`torch.optim.SGD`| 是随机梯度下降（Stochastic Gradient Descent, SGD）优化器的实现。|

```python
# 输入层数量
input_size = input_features.shape[1] # 确定输入特征的数量
# 输出层数量
output_size = 1
# 隐藏层数量
hidden_size = 128
# 批量处理大小，指定了每次训练过程中同时处理的数据样本数量
batch_size = 16
# 构建神经网络模型
my_nn = torch.nn.Sequential(
    torch.nn.Linear(input_size, hidden_size), # 添加输入层到隐藏层
    torch.nn.Sigmoid(),# 在隐藏层后添加Sigmoid激活函数
    torch.nn.Linear(hidden_size, output_size) # 添加隐藏层到输出层
)
cost = torch.nn.MSELoss(reduction='mean') # 使用均方误差损失函数，用于回归问题
optimizer = torch.optim.SGD(my_nn.parameters(), lr=0.001) # 使用随机梯度下降优化器，学习率设为0.001
```

```python
# 损失集合
losses = []
for i in range(1000): # 迭代1000次
    batch_loss = [] # 用于存储每个批次的损失
    # MINI-Batch 方法来进行训练
    for start in range(0, len(input_features), batch_size):
        # 计算批次的结束索引
        end = start + batch_size if start + batch_size < len(input_features) else len(input_features)
        # 从输入特征和标签中获取当前批次的数据
        xx = torch.tensor(input_features[start:end], dtype=torch.float,requires_grad=True)
        yy = torch.tensor(labels[start:end], dtype=torch.float,requires_grad=True)
        # 通过网络进行前向传播得到预测结果
        prediction = my_nn(xx)
        # 计算损失
        loss = cost(prediction, yy)
        # 清除之前的梯度
        optimizer.zero_grad()
        # 反向传播，计算梯度
        loss.backward(retain_graph=True)
        # 更新权重
        optimizer.step()
        # 记录批次损失
        batch_loss.append(loss.data.numpy())
    
    # 打印损失
    if i % 100 == 0:
        losses.append(np.mean(batch_loss))
        print(i, np.mean(batch_loss))
```

>0 1322.4116
100 37.27904
200 36.839382
300 36.445923
400 36.126297
500 35.88112
600 35.694214
700 35.547215
800 35.425312
900 35.318115

tn2>预测训练结果

```python
x = torch.tensor(input_features, dtype=torch.float)
prediction = my_nn(x).data.numpy()
```

```python
# 转换日期格式
datas = [str(int(year)) + '-' + str(int(month)) + '-' + str(int(day)) for year, month, day in zip(years,months,days)]
datas = [datetime.datetime.strptime(data, '%Y-%m-%d') for data in datas]

# 创建一个表格来存日期和其对应的标签数值
true_data = pd.DataFrame(data={'date':datas, 'actual':labels})
# 同理，再创建一个来存日期和其对应的模型预测值
months = features[:, feature_list.index('month')]
days = features[:, feature_list.index('day')]
years = features[:, feature_list.index('year')]

test_dates = [str(int(year)) + '-' + str(int(month)) + '-' + str(int(day)) for year, month, day in zip(years,months,days)]

test_dates = [datetime.datetime.strptime(data, '%Y-%m-%d') for data in test_dates]

prediction_data = pd.DataFrame(data={'date':test_dates, 'prediction':prediction.reshape(-1)})
```

```python
# 真实值
plt.plot(true_data['date'], true_data['actual'], 'b-', label = 'actual')
# 预测值
plt.plot(prediction_data['date'], prediction_data['prediction'], 'ro', label = 'prediction')
plt.xticks(rotation = 60); 
plt.legend()

# 图名
plt.xlabel('Date'); plt.ylabel('Maximum Temperature (F)'); plt.title('Actual and Predicted Values');
```

![](https://img.tnblog.net/arcimg/hb/3a169cc833534eb99dd1c4574ef4800a.png)

欢迎加群讨论技术，1群：677373950(满了，可以加，但通过不了)，2群：656732739

👈{{preArticle.title}}

👉{{nextArticle.title}}

评价

尘叶心繁

这一世以无限游戏为使命！

博主信息

排名

文章

粉丝

文章类别

.net后台框架 185篇

linux 18篇

linux中cve 1篇

windows中cve 0篇

资源分享 10篇

Win32 3篇

前端 28篇

传说中的c 5篇

Xamarin 9篇

docker 15篇

容器编排 101篇

grpc 4篇

Go 15篇

yaml模板 1篇

理论 2篇

Pytorch 气温预测

{{titleitem}} {{titleitem}}

{{titleitem}} {{titleitem}}