openglasses项目软件代码改造

1、用ollma进行图像解释，暂时只有两种方案，第一种是和原代码中一样，使用大模型来进行对话解释，国内暂时只有【阿里的通义千问】能做到这点；第二种是用传统的sdk/api方式，暂时只有【百度】可以做到这点，但是百度的接口有些麻烦，要连续调两个接口才行（第一个接口时返回一个reqid，第二个是用这个reqid来获取接口）

// 定义MessageContent接口
interface MessageContent {
    type: string;
    text?: string;
    image_url?: {
        url: string;
    };
}
// 定义Message接口
interface Message {
    role: string;
    content: MessageContent[];
}
export type KnownModelQianwen =
    | 'qwen-vl-max-latest'
    | 'qwen-vl-plus'
    | 'qwen-vl-max'

export async function qianwenInference(args: {
    model: string,
    messages: Message[]
}) {
    const token: string = keys.qianwen_key;
    const headers: { [key: string]: string } = {
        "Content-Type": "application/json", "Authorization": `Bearer ${token}`
    }

    const response = await backoff<any>(async () => {

        let converted: { role: string, content: MessageContent[] }[] = [];
        for (let message of args.messages) {
            console.log("message=", message)
            let content_tmp: MessageContent[] = []
            
            for (let content of message.content) {
                content_tmp.push({
                    type: content.type,
                    text: content.text ? trimIdent(content.text) : undefined,
                    image_url: content.image_url ? {
                        url: content.image_url.url
                    } : undefined
                })
            }
            converted.push({
                role: message.role,
                content: content_tmp
            });
        }
        const payload = {
            model: args.model,
            messages: converted,
        }
        let resp = await axios.post(keys.qianwen, payload, {
            headers: headers
        });
        // console.log("resp=",resp)
        return resp['data']["choices"][0]["message"]["content"];

    });
    return trimIdent(((response ?? '') as string));
}

2、grop的目的是单纯的大模型对话，原则上用哪个http api接口/sdk都可以。但是有个细节需要注意，这个细节就可以筛选掉一大批的选择，几乎所有的node.js对应的sdk都用到了node里面的util包，但是这个包在浏览器里面是不支持的，react前端项目在启动的时候，会生成localhost:8081，这个需要在浏览器进行访问，所以所有的sdk都可以放弃了。

只能使用http api的方式用axios来进行请求，百度、阿里的都可以。但是百度的这个http api还有个不太好的地方，无法直接用localhost:8081来调用他们的api，会报错：from origin ‘http://localhost:8081’ has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource.。但是阿里的不会，所以我们选用阿里的即可

如果真的想用百度的，可以启动一个解除web安装策略的浏览器窗口，就不会报这种跨域错误了，命令如下：

cd C:\Program Files\Google\Chrome\Application>
chrome.exe --user-data-dir="C://Chrome dev session" --disable-web-security

下面给下阿里的通义接口

const token: string = keys.qianwen_key;
const headers: { [key: string]: string } = {
    "Content-Type": "application/json", "Authorization": `Bearer ${token}`
}
export async function tongyiRequest(systemPrompt: string, userPrompt: string) {
    try {
        console.info("Calling tongyiRequest qwen-plus")
        const response = await axios.post(keys.qianwen, {
            model: "qwen-plus",
            messages: [
                { role: "system", content: systemPrompt },
                { role: "user", content: userPrompt },
            ],
        }, { headers });
        console.info("tongyiRequest response = ", response)
        return response.data.choices[0].message.content;
    } catch (error) {
        console.error("Error in groqRequest:", error);
        return null; // or handle error differently
    }
}

3、openai的tts接口，就是文本转语音接口。作者测试了腾讯、百度、阿里、微软、讯飞等平台的接口，发现百度的最简单使用，其他的都不满足使用，这里直接给出百度的代码

attention：直接用也会报跨域错误，要用前面第2点提到的解决办法

const AK = "**"
const SK = "**"

/**
 * 使用 AK，SK 生成鉴权签名（Access Token）
 * @return string 鉴权签名信息（Access Token）
 */
function getAccessToken() {

    let options = {
        'method': 'POST',
        'url': 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=' + AK + '&client_secret=' + SK,
    }
    return new Promise((resolve, reject) => {
        axios(options)
            .then(res => {
                resolve(res.data.access_token)
            })
            .catch(error => {
                reject(error)
            })
    })
}

export async function BaiduHttpApiTextToSpeech(text: string) {
    console.log("BaiduHttpApiTextToSpeechtext = ", text)
    const token = getAccessToken()
    console.log("BaiduHttpApiTextToSpeech token = ", token)
    
    const response = await axios.post("https://tsn.baidu.com/text2audio", {
                'tex': text,
                'tok': await getAccessToken(),
                'cuid': 'XB1oEeNOJVjS5oYGQYlbdrWDwPcIJbCP',
                'ctp': '1',
                'lan': 'zh',
                'spd': '5',
                'pit': '5',
                'vol': '5',
                'per': '1',
                'aue': '3'
            }, {
            headers: {
                'Content-Type': 'application/x-www-form-urlencoded',
                'Accept': '*/*'
            },
            responseType: 'arraybuffer'  // This will handle the binary data correctly 这将正确处理二进制数据
        });

    console.log("BaiduHttpApiTextToSpeech response=", response)
    // Decode the audio data asynchronously
    const audioBuffer = await audioContext.decodeAudioData(response.data);

    // Create an audio source
    const source = audioContext.createBufferSource();
    source.buffer = audioBuffer;
    source.connect(audioContext.destination);
    source.start();  // Play the audio immediately
}

六、启动

npm start启动后，就可以测试上面的功能了

openglasses项目软件代码改造 - 知乎

一、项目地址

二、代码成分

三、项目介绍

四、在网上这个项目的如何搭建的文档有很多，这里列举几个

五、源码改造

六、启动

相关推荐

热门标签

分类

链接表

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏