参考:https://www.cnblogs.com/gentlesunshine/p/12452360.html
虽然是之前的 ML-Agents v0.15.0,基于TensorFlow的,但是安装环境的道理都差不多
一、PyTorch、CUDA、cuDNN的版本问题
按着教程装了一遍,训练的时候出现这个:
data:image/s3,"s3://crabby-images/3685f/3685f9b315ceac837a9f1e307d7d001155b2129a" alt=""
意思是PyTorch要1.6.0以上的版本,但是CUDA10.0最高版本也只是支持到PyTorch1.1.0,所以要重新安装CUDA和cuDNN。
参考:https://blog.csdn.net/HaoZiHuang/article/details/107878351
到这里我眉头一皱,感觉问题大了呀,于是从来不看官方文档的我去瞄了一喵
官方文档:https://github.com/Unity-Technologies/ml-agents/blob/release_12_docs/docs/localized/zh-CN/docs/Readme.md
还是官方文档好
下了一手CUDA11.2
data:image/s3,"s3://crabby-images/825bc/825bc2cb81f43229ec49af2f6cfa575bbae0bda2" alt=""
再来一手最新的cuDNN(没办法,虽然它是写的支持CUDA11.1,但我没得选)
data:image/s3,"s3://crabby-images/f437a/f437ab6e643ce5cd8b62812d4fb9181a13b48e65" alt=""
使用PyCharm安装PyTorch(失败了)
data:image/s3,"s3://crabby-images/ff741/ff741fe195a8d47c480cb4734f26c76aa5e2232d" alt=""
出错了
data:image/s3,"s3://crabby-images/6bc36/6bc36759ec10cc9d9fa99e56286d0ed268440d67" alt=""
去官网看看
data:image/s3,"s3://crabby-images/4c1ad/4c1ade323519d41bfacd590792f651e04cd189f3" alt=""
这下总算好了吧
pip install C:\Users\liyuanhang\Desktop\torch-1.7.1+cu110-cp38-cp38-win_amd64.whl
data:image/s3,"s3://crabby-images/0e860/0e8606bfac106ce4de0a8476cdb19fae6c5f8ef5" alt=""
data:image/s3,"s3://crabby-images/ab888/ab8888aaea4c3fcbd71d3c88216a22609dc5dfa3" alt=""
验证一下Pytorch是否可以使用GPU和CUDA
data:image/s3,"s3://crabby-images/4093d/4093d3bd3646494b94cfee7c6adbfce5d077665c" alt=""
意思是我电脑没得NVIDIA GPU?
吓得我赶紧看了看
data:image/s3,"s3://crabby-images/2dc76/2dc7617007a012deb1b98885721ad6b22401056d" alt=""
真没有
参考:https://blog.csdn.net/weixin_41194129/article/details/107475509
那算了,换台电脑吧
家里有台旧电脑,配置很辣鸡,但是显卡是NVIDIA,所以装一个试试
老样子python,cuda,cudnn,Anaconda,pytorch
然后报错
data:image/s3,"s3://crabby-images/08556/08556f17df8b802bb56f72b5b5240f6f26b2a04c" alt=""
参考:https://blog.csdn.net/weixin_42868552/article/details/107990522
参考:https://blog.csdn.net/hinson0710/article/details/107656971
但是vc的库我装好了也没用,把cafffe2_dectron_ops_gpu删了也没用,其他文件还是会报错
难道版本还是不对?
看了所有的地方都没办法
然后想一想,ProgramData这个文件夹好像是默认的“只读”和“隐藏”状态
改了之后还是没好
然后回到自己的电脑,装了没有cuda的pytorch
跟着官方文档走着
运行mlagents-learn出现这个
mlagents.trainers.exception.UnityTrainerException: Previous data from this run ID was found. Either specify a new run ID, use --resume to resume this run, or use the --force parameter to overwrite existing data.
data:image/s3,"s3://crabby-images/eadf4/eadf4b7250118a9baa578f53f62bc1983eaf7042" alt=""
引发UnityTraineException(Mlagents.trainers.exception.UnityTrainerException:找到此运行ID中的以前数据。指定新的运行ID,使用--Resume恢复此运行,或者使用--force参数覆盖现有数据。
运行mlagents-learn --resume
出现这个
mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that : The environment does not need user interaction to launch The Agents' Behavior Parameters > Behavior Type is set to "Default" The environment and the Python interface have compatible versions.
data:image/s3,"s3://crabby-images/23e3f/23e3f7f1296afbef6df27f3baa93be5539f3d2c7" alt=""
Unity花了太长时间才做出反应。
确保:该环境不需要用户交互即可启动。
代理的行为参数>行为类型设置为“Default”
环境和Python接口具有兼容的版本
查了下有人是因为项目路径有中文——但我没有,PASS
不管了,先搞一下Demo吧
data:image/s3,"s3://crabby-images/28057/28057bd81e01b622f2f54d1b5eb0d3acdd4d7155" alt=""
但是tensorboard没出来
data:image/s3,"s3://crabby-images/89855/89855668e59c99b9bdcb7d8f580047dba13dbb3c" alt=""
装个tensorflow吧
pip install tensorflow-cpu -i https://pypi.douban.com/simple/
装着装着就报错了
记录下几个命令:
// mlagents-learn config/try1_config.yaml --run-id=try1-1 --train
// mlagents-learn config\trainer_config.yaml --run-id=test01 --train
// mlagents-learn config\ppo\3DBall.yaml --run-id=tryball01 --train
//C:\Users\liyuanhang\Desktop\mirrors-Unity-ML-Agents-master\Unity-ML-Agents\config\ppo
// tensorboard --logdir="C:\Users\liyuanhang\Desktop\mirrors-Unity-ML-Agents-master\Unity-ML-Agents\results\tryball01\run_logs"-host=127.0.0.1
// mlagents-learn config\ppo\WalkerDynamic.yaml --run-id=tryWalker01 --train
// mlagents-learn config\ppo\WalkerDynamic.yaml --run-id=tryWalker0416 --train
总结:
没有英伟达显卡的要装cpu版的pytorch,ml-agents照旧,但是部分项目结构和原来的不同,命令也有些改变,跟着官方文档不会错
咱这小白好惨的
【0118】隔了好多天发现这个还在草稿箱,先占个坑
【0120】发现tensorboard出不来是因为浏览器的原因,原来用的QQ浏览器,现在换了谷歌就好了
运行命令要另开一个环境进入目录运行tensorboard --logdir results
data:image/s3,"s3://crabby-images/066ed/066edff77ecfc33e3dceff29b044b65f7a6c1263" alt=""
data:image/s3,"s3://crabby-images/9ac4f/9ac4f281777740ab779368fc51817b700b8590b8" alt=""
data:image/s3,"s3://crabby-images/48a41/48a41c624594a4b2e01216c5a101f34d86bf83d2" alt="" data:image/s3,"s3://crabby-images/57a16/57a16be8306ee838a5bc4ab92a7a9e498111d240" alt=""
训练小球追方块
data:image/s3,"s3://crabby-images/4c259/4c259d950b646a041bf8e73ce9a97b65f7626eb7" alt=""
(base) C:\Users\liyuanhang\Desktop\mirrors-Unity-ML-Agents-master\Unity-ML-Agents>mlagents-learn config/try1_config.yaml --run-id=firstTry01
data:image/s3,"s3://crabby-images/70d40/70d40f1db7df71dea247f4f15e42b880724a44df" alt=""
data:image/s3,"s3://crabby-images/4c395/4c3955cf52f7ee45468e44a4716dbbc33cd01dcd" alt=""
data:image/s3,"s3://crabby-images/6b3d8/6b3d8534936b9f841b7b6721927046bbb376a897" alt=""
data:image/s3,"s3://crabby-images/a2a38/a2a38b27f6b2fb2a41253591605d2af5c579c44a" alt=""
2021-0219:新建项目时,先在packages management里面装ml-agent再升级
data:image/s3,"s3://crabby-images/1331a/1331a63a22ab6dbaa0af33d989695fcf1466d208" alt=""
当我换了工程的地方,再次训练的时候报错
data:image/s3,"s3://crabby-images/764d7/764d73cc137c63eff0c7ccf09be1e6ce5d12371a" alt=""
这个时候需要
pip install -e ml-agents-envs -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
pip install -e ml-agents -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
然后又报错
data:image/s3,"s3://crabby-images/ab636/ab6367f21ac7808870e47803527951dadf6b375c" alt=""
可能是要更新pip工具
但是又发现没用
应该是包的版本问题,但是是哪些包呢
不管了,又开了个新环境,装了pytorch然后是上面俩命令开搞,欧克了
进行Walk换模型的步骤大概是,建模,骨骼,蒙皮,进unity,对walkagent脚本绑定骨骼,生成ragdoll,删除角色控制器,添加覆写的那个,各个刚体的关系,碰撞器,地面接触脚本
0416发现没有谷歌的,有链接关系的骨骼会让模型动不起来,把模型做的分开就好,不用蒙皮
--resume的话,要改配置,不然只到500000步,就没达到要求
data:image/s3,"s3://crabby-images/d2de5/d2de51c5582b085d49e88df5c886ea0b2fed3e48" alt=""
但是改了配置就不能用了
data:image/s3,"s3://crabby-images/64d52/64d523ddef7d0a910a79528a5f0bfaba44f80192" alt=""
呃,是因为behavior的名字没改,改了之后后开始全新的冒险
0423出现Target不刷新,原因是自己的模型上面没有agent标签
data:image/s3,"s3://crabby-images/6b26e/6b26efd3440091d51f9932aa2e5d262aa326d32c" alt=""
|