发表文章

[Python] resnext 速度 resnext speed[pretrained-models.pytorch]

Mctigger 2017-10-9 54

嘿,
我尝试了 resnext-101 (32x4d) 为160x160 作物, 也 torchvision resnet-101。对我来说, 与 resnet 相比, resnext 需要的训练时间要快一倍。(我只改变了最后的 GlobalAveragePooling 和分类器层)
这是预期的行为吗?
这里它说所需的触发器是相同的, 但很明显, 像 Pytorch 这样的框架可能会因为更复杂的网络结构而变慢。

原文:

Hey,
I tried resnext-101(32x4d) for 160x160 crops and also the torchvision resnet-101. For me resnext needs nearly double the time to train compared to the resnet. (Only thing I changed is the final GlobalAveragePooling and classifier layer)
Is this expected behavior?
Here it says the required FLOPs are the same, but obviously frameworks like Pytorch may be slower because of a more complicated network structure)

相关推荐
最新评论 (9)
Cadene 2017-10-9
1

你说的 "比 resnet 训练的时间加倍" 是什么意思?双时间做一个向前 + 向后传球?

并且, 通常那些 resnet 或 resnext 网络采取大小的庄稼224x224。你为什么要用160x160 的庄稼?

但无论如何, 你是对的: 由于明显的实施原因, resnext 的处理时间可能会更高, 即使拖鞋的数量几乎相同。

原文:

What do you mean by "double the time to train compared to the resnet" ? Double time to do a forward + backward pass ?

Also, usually those resnet / resnext networks take crops of size 224x224. Why do you use 160x160 crops ?

But anyway, you are right: because of obvious implementation reasons, the processing time may be higher for resnext, even if the amount of FLOPs is almost the same.

Mctigger 2017-10-9
2

是的, 整个时代的训练。就像你说的向前 + 向后。我的用例是160x160 作物, 我想尝试 resnext。您的存储库使此非常简单:)

很高兴知道我只是想检查我有没有错过的东西, 或我的实现是马车。

原文:

Yea, training for a whole epoch. So just like you said forward + backward. My use case is on 160x160 crops and I wanted to try resnext. Your repository makes this very easy :)

Good to know. I just wanted to check I have not missed something or that my implementation is buggy.

Cadene 2017-10-9
3

编辑: 另外, 通常那些 resnet 或 resnext 网络采取大小的庄稼224x224

原文:

Edit: Also, usually those resnet / resnext networks take crops of size 224x224.

Mctigger 2017-10-9
4

我理解。但我不想训练 imagenet我使用160x160 作物的180x180 数据集上的传输学习

原文:

I understand. But I don't want to train imagenet. I am using transfer learning on a 180x180 dataset with 160x160 crops.

Cadene 2017-10-9
5

我很惊讶, 它的工作与160x160 的投入。如果我是你, 我会尝试做一个升从180x180 到 224x224 (也许多一点, 如果你想添加一些数据扩充):P

关于

原文:

I am surprised that it works with 160x160 inputs. If I was you, I would try to do an upsampling from 180x180 to 224x224 (maybe a bit more if you want to add some data augmentation) :P

Regards

Mctigger 2017-10-9
6

是的, 我已经考虑过升, 但得出的结论是, 使用扩张的积可能更有意义, 因为我的输入没有插值, 但它仍然保持高分辨率特征映射。
低层层仍然可以保持不变, 并使用 imagenet 权重作为初始化, 因为这些都是非常简单的过滤器 (例如, 对边缘的反应, 如索贝尔)。虽然中高层过滤器可能需要改变很多。

原文:

Yea, I already thought about upsampling, but came to the conclusion that most probably it makes more sense to use dilated convolutions instead, since then my input is not interpolated, but it still maintains high resolution feature maps.
The low-level layers can still stay the same and use the imagenet weights as initialization since these are pretty much only simple filters (which for example react to edges, like Sobel). The mid- to high-level filters will probably have to change a lot though.

Cadene 2017-10-9
7

哇那很酷, 你用扩张的积有更好的结果吗?

原文:

Wow that's cool, did you get better results using dilated convolutions ?

Mctigger 2017-10-9
8

我还没试过呢这些只是我的想法。实际上, 我不知道关于使用扩张积进行分类的艺术论文的状态。但在语义分割中, 它们常常被用来实现更大的视野。从直观上看, 它似乎像一般的视野可以通过某种池操作/跨入卷积 (这减少了分辨率, 这就是为什么224输入大于 160) 或使用扩张积 (分辨率保持不变) 来增加 (但计算增加)。
因为我认为低级过滤器不是一个精确的瓶颈, 而是中/高级功能, 所以在那里使用扩张的积可能会有意义。
因此, 我可能会尝试使用 160-> 80-> 40-> 20 的正常架构, 然后删除其余两个池操作, 并在20x20 上使用扩张积。

但就像我说的, 我还没试过呢!

原文:

I did not try any of this yet. These are just my thoughts. I actually do not know about a state of the art paper for classification using dilated convolutions. But in semantic segmentation they are often used to achieve a greater field of view. Intuitively it seems like in general the field of view can be increased by either some pooling operation/strided convolution (which reduces resolution, that's why 224 input is better than 160) or by using dilated convolutions (resolutions stays the same, but computation increases).
Since I assume that low-level filters are not a accuracy-bottleneck, but mid-/high-level features are it may make sense to use dilated convolutions there.
So I maybe will try to use the normal architecture for 160 -> 80 -> 40 -> 20 and then remove the remaining two pooling operations and use dilated convolutions on 20x20.

But like I said, I didn't try any of this yet!

Cadene 2017-10-9
9

请让我知道:)

原文:

Please, let me know :)

返回
发表文章
Mctigger
文章数
1
评论数
4
注册排名
60802