爱可可AI前沿推介(7.17)

论坛 期权论坛 期权     
期权匿名问答   2022-7-25 08:29   6598   0
LG - 机器学习   CV - 计算机视觉   CL - 计算与语言   AS - 音频与语音 RO - 机器人

摘要:面向高斯过程准确预测的基于波动率的核与移动平均方法、多样化乐器音乐长程依赖学习、用AI提高维基百科的可验证性、基于开放式对话的机器学习模型理解、面向照片外推的户外场景的因子化可控神经重渲染、网页版强化学习集成开发环境(IDE)、基于文档阅读与检索的代码生成、用先验数据自动化机器人强化学习、面向跨视图图像翻译的Parallel-ConvMLP和隐变换GAN

1、[LG] Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes


G Benton, W J. Maddox, A G Wilson
[New York University]
面向高斯过程准确预测的基于波动率的核与移动平均方法。一大类随机波动率模型是由随机微分方程系统定义的。虽然这些模型在金融和统计气候学等领域取得了广泛的成功,但它们通常缺乏对历史数据进行调节以产生真正的后验分布的能力。为解决这个基本的局限性,本文展示了如何将一类随机波动率模型重塑为具有专门协方差函数的层次高斯过程(GP)模型。该GP模型保留了随机波动率模型的归纳偏差,同时提供了GP推理所给出的后验预测分布。在该框架内,本文从已研究的领域获得灵感,引入了一类新的模型,即Volt和Magpie,在股票和风速预测中的表现明显优于基线,并自然延伸到多任务场景。
A broad class of stochastic volatility models are defined by systems of stochastic differential equations. While these models have seen widespread success in domains such as finance and statistical climatology, they typically lack an ability to condition on historical data to produce a true posterior distribution. To address this fundamental limitation, we show how to re-cast a class of stochastic volatility models as a hierarchical Gaussian process (GP) model with specialized covariance functions. This GP model retains the inductive biases of the stochastic volatility model while providing the posterior predictive distribution given by GP inference. Within this framework, we take inspiration from well studied domains to introduce a new class of models, Volt and Magpie, that significantly outperform baselines in stock and wind speed forecasting, and naturally extend to the multitask setting.
https://arxiv.org/abs/2207.06544













2、[AS] Multitrack Music Transformer: Learning Long-Term Dependencies in Music with Diverse Instruments


H Dong, K Chen, S Dubnov, J McAuley, T Berg-Kirkpatrick
[University of California San Diego]
多轨音乐Transformer:多样化乐器音乐长程依赖学习。现有的用Transformer模型生成多轨音乐的方法只限于一小部分乐器或短音乐片段。这部分是由于现有的多轨音乐表示方法所需要的长的输入序列的内存要求造成的。本文提出一种紧凑的表示法,允许多样化的乐器集,同时保持较短的序列长度。基于该表示方法,本文提出多轨音乐Transformer(MTMT),用于学习多轨音乐长程依赖。在一个主观听觉测试中,所提出的模型在无条件生成上达到了与两个基线模型竞争的质量。该模型可生成两倍于基线模型产生的样本,而且,可在一半的推理时间内做到这一点。此外,本文提出了一种分析音乐自注意力的新措施,并表明经过训练的模型学会了较少关注与当前音符形成不和谐音程的音符,更多地关注与当前音符相距4N拍的音符。本文的研究结果为未来探索长篇多轨音乐生成和改善音乐的自注意力的工作提供了一个新的基础。
Existing approaches for generating multitrack music with transformer models have been limited to either a small set of instruments or short music segments. This is partly due to the memory requirements of the lengthy input sequences necessitated by existing representations for multitrack music. In this work, we propose a compact representation that allows a diverse set of instruments while keeping a short sequence length. Using our proposed representation, we present the Multitrack Music Transformer (MTMT) for learning long-term dependencies in multitrack music. In a subjective listening test, our proposed model achieves competitive quality on unconditioned generation against two baseline models. We also show that our proposed model can generate samples that are twice as long as those produced by the baseline models, and, further, can do so in half the inference time. Moreover, we propose a new measure for analyzing musical self-attentions and show that the trained model learns to pay less attention to notes that form a dissonant interval with the current note, yet attending more to notes that are 4N beats away from current. Finally, our findings provide a novel foundation for future work exploring longer-form multitrack music generation and improving self-attentions for music. All source code and audio samples can be found at this https URL .
https://arxiv.org/abs/2207.06983













3、[IR] Improving Wikipedia Verifiability with AI


F Petroni, S Broscheit, A Piktus, P Lewis, G Izacard, L Hosseini, J Dwivedi-Yu, M Lomeli, T Schick, P Mazaré, A Joulin, E Grave, S Riedel
[Meta AI & Amazon Alexa AI]
用AI提高维基百科的可验证性。可验证性是维基百科的一项核心内容策略:有可能被质疑的主张需要有引文的支持。网上有数以百万计的文章,每个月有数以千计的新文章发布。由于这个原因,寻找相关来源是一项艰巨的任务:许多主张没有任何支持它们的参考文献。此外,即使是既有的引文也可能不支持某个特定的主张,或者一旦原始来源被更新或删除就会变得过时。因此,维护和提高维基百科的引用质量是一个重要的挑战,迫切需要更好的工具来协助人类完成这项工作。本文表明,改进参考文献的过程可以在人工智能(AI)的帮助下解决。本文开发了一套基于神经网络的系统SIDE,用于识别不太可能支持其主张的维基百科引文,并随后从网上推荐更好的引文。在现有的维基百科引文上训练该模型,从成千上万的维基百科编辑的贡献和集体智慧中学习。众包结果显示,对于最有可能被系统标记为不可验证的前10%的引文,与最初引用的参考文献相比,人工有70%的时间更喜欢本系统建议的版本。本文结果表明,基于人工智能的系统可以与人工一起使用,以提高维基百科的可验证性。希望本文工作可以用来协助事实核查,提高网上信息的总体可信度。
Verifiability is a core content policy of Wikipedia: claims that are likely to be challenged need to be backed by citations. There are millions of articles available online and thousands of new articles are released each month. For this reason, finding relevant sources is a difficult task: many claims do not have any references that support them. Furthermore, even existing citations might not support a given claim or become obsolete once the original source is updated or deleted. Hence, maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort. Here, we show that the process of improving references can be tackled with the help of artificial intelligence (AI). We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web. We train this model on existing Wikipedia references, therefore learning from the contributions and combined wisdom of thousands of Wikipedia editors. Using crowd-sourcing, we observe that for the top 10% most likely citations to be tagged as unverifiable by our system, humans prefer our system’s suggested alternatives compared to the originally cited reference 70% of the time. To validate the applicability of our system, we built a demo to engage with the English-speaking Wikipedia community and find that Side’s first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims according to Side. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia. More generally, we hope that our work can be used to assist fact checking efforts and increase the general trustworthiness of information online. All our code, data, indexes and models are publicly available at https://github.com/facebookresearch/side.
https://arxiv.org/abs/2207.06220













4、[CL] TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues


D Slack, S Krishna, H Lakkaraju, S Singh
[UC Irvine & Harvard University]
TalkToModel: 基于开放式对话的机器学习模型理解。机器学习(ML)模型越来越多地被用来在现实世界的应用中做出关键决策,然而它们也变得更加复杂,使对其的理解愈发困难。为此,人们提出了几种解释模型预测的技术。然而,从业人员在利用解释时很费劲,因为他们往往不知道该用哪一种,如何解释结果,而且可能没有足够的数据科学经验来获得解释。此外,目前的大多数工作都集中在生成一次性解释,不允许用户跟进并提出关于解释的细化问题,这可能会让人感到沮丧。本文提出TalkToModel来解决这些挑战:一个用于理解机器学习模型的开放式对话系统。TalkToModel包括三个关键部分:1)一个用于参与对话的自然语言界面,使理解机器学习模型变得非常容易;2)一个对话引擎,可以自适应任意表格模型和数据集,解释自然语言,将其映射到适当的操作(例如,特征重要性解释,反事实解释,显示模型错误),并生成文本响应;3)一个执行组件,运行操作并确保解释的准确性。本文对TalkToModel进行了定量和人工评估。发现该系统能够理解用户在新的数据集和模型上的问题,并具有很高的准确性,证明了该系统对新情况的泛化能力。
Machine Learning (ML) models are increasingly used to make critical decisions in real-world applications, yet they have also become more complex, making them harder to understand. To this end, several techniques to explain model predictions have been proposed. However, practitioners struggle to leverage explanations because they often do not know which to use, how to interpret the results, and may have insufficient data science experience to obtain explanations. In addition, most current works focus on generating one-shot explanations and do not allow users to follow up and ask fine-grained questions about the explanations, which can be frustrating. In this work, we address these challenges by introducing TalkToModel: an open-ended dialogue system for understanding machine learning models. Specifically, TalkToModel comprises three key components: 1) a natural language interface for engaging in dialogues, making understanding ML models highly accessible, 2) a dialogue engine that adapts to any tabular model and dataset, interprets natural language, maps it to appropriate operations (e.g., feature importance explanations, counterfactual explanations, showing model errors), and generates text responses, and 3) an execution component that run the operations and ensures explanations are accurate. We carried out quantitative and human subject evaluations of TalkToModel. We found the system understands user questions on novel datasets and models with high accuracy, demonstrating the system’s capacity to generalize to new situations. In human evaluations, 73% of healthcare workers (e.g., doctors and nurses) agreed they would use TalkToModel over baseline point-and-click systems, and 84.6% of ML graduate students agreed TalkToModel was easier to use.
https://arxiv.org/abs/2207.04154









5、[CV] Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation


B Zhao, B Yang, Z Li, Z Li, G Zhang...
[Zhejiang University & Baidu & ETH Zürich & Wilfrid Laurier University]
面向照片外推的户外场景的因子化可控神经重渲染。将现有的旅游照片从部分拍摄的场景扩展到完整的场景,是摄影应用的理想体验之一。虽然照片外推已经得到了很好的研究,但要把一张照片(自拍)从狭窄的视野中外推到更宽的视野中,同时保持类似的视觉风格,则是更大的挑战。本文提出一种因子化神经重渲染模型,以从杂乱的户外互联网照片集中产生逼真的新视图,这使得包括可控场景重现、照片外推甚至外推3D照片生成等应用成为可能。本文首先开发了一种新的因子化重渲染管线,以处理几何、外观和光照分解中的模糊性。提出了一种合成训练策略,以解决互联网图像中的意外遮挡问题。此外,为了在外推旅游照片时增强照片的真实性,本文提出了一种新的真实性增强过程,以补充外观细节,该过程自动将纹理细节从拍摄照片传播到推断的神经渲染图像。户外场景的实验和照片编辑实例证明了所提出方法在照片真实性和下游应用方面的优越性能。
Expanding an existing tourist photo from a partially captured scene to a full scene is one of the desired experiences for photography applications. Although photo extrapolation has been well studied, it is much more challenging to extrapolate a photo (i.e., selfie) from a narrow field of view to a wider one while maintaining a similar visual style. In this paper, we propose a factorized neural re-rendering model to produce photorealistic novel views from cluttered outdoor Internet photo collections, which enables the applications including controllable scene re-rendering, photo extrapolation and even extrapolated 3D photo generation. Specifically, we first develop a novel factorized re-rendering pipeline to handle the ambiguity in the decomposition of geometry, appearance and illumination. We also propose a composited training strategy to tackle the unexpected occlusion in Internet images. Moreover, to enhance photo-realism when extrapolating tourist photographs, we propose a novel realism augmentation process to complement appearance details, which automatically propagates the texture details from a narrow captured photo to the extrapolated neural rendered image. The experiments and photo editing examples on outdoor scenes demonstrate the superior performance of our proposed method in both photo-realism and downstream applications.
https://arxiv.org/abs/2207.06899













另外几篇值得关注的论文:

[AI] GriddlyJS: A Web IDE for Reinforcement Learning


GriddlyJS:网页版强化学习集成开发环境(IDE)
C Bamford, M Jiang, M Samvelyan, T Rocktschel
[Queen Mary University & Meta AI & UCL]
https://arxiv.org/abs/2207.06105













[CL] DocCoder: Generating Code by Retrieving and Reading Docs


DocCoder:基于文档阅读与检索的代码生成
S Zhou, U Alon, F F. Xu, Z JIang, G Neubig
[CMU]
https://arxiv.org/abs/2207.05987













[LG] Don't Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning


不要从头开始:用先验数据自动化机器人强化学习
H Walke, J Yang, A Yu, A Kumar, J Orbik, A Singh, S Levine
[UC Berkeley & UT Austin & Google]
https://arxiv.org/abs/2207.04703











[CV] PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image Translation


PI-Trans:面向跨视图图像翻译的Parallel-ConvMLP和隐变换GAN
B Ren, H Tang, Y Wang...
[University of Pisa & ETH Zurich & Fondazione Bruno Kessler (FBK) & University of Trento]
https://arxiv.org/abs/2207.04242










分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:399472
帖子:79895
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP