Hacker Newsnew | past | comments | ask | show | jobs | submit | philipp2310's commentslogin

yeah, was quite shocked when I learned about yandex' Alice..

To be fair: Project Alice is older than yandex Alice and there is nothing but the name connecting these two assistants!


true words, thank you. While the projects logo/name exists longer than I'm in the project, I had my thoughts about it as well.

I came to the conclusion, there are so many car stickers/unlicensed merch or even companies with a similar logo, there shouldn't be an issue for a small open source project. Maybe a dangerous conclusion, that should be revised when Alice grows :)


I'm not sure I'd express it so harshly as TingPing. I like the project and hope it succeeds. Still, seems somehow unwholesome and invites unnecessary controversy. A good logo represents the project or company, and so what does it say that you all are associated with a erm... borrowed logo?


That attitude doesn't reflect well on the project IMO. Very short-sighted and unprofessional.


Why would they need to be professional? It's an open source hobby project. People are allowed to code for fun.


Hey if you want it, I threw together an umbrella inspired logo that might be a good alternative. It's close, but should be different enough to be distinguishable.

https://i.imgur.com/aPMBndB.png

CC0 and all that of course.


Thank you, my first thought was a mixture of umbrella corp and the scanner drones from half life 2 :)

I put it up for discussion. We don't want to cling to our current design, but we aren't sure yet about a good way to approach a redesigns. Needs a bit time to decide what aspects of Alice in total should be redesigned


Now that you mention it, it does remind me of the city scanner too haha

Well if you do go for it or want to make any tweaks to it I've now uploaded the svg source too: https://drive.google.com/file/d/1FIICC0ySnwzBB5NdWcDX3uXDdwO...

I hope you find something that fits :)


are you… also using the battle.net logo on the skill store? why

you’re like actively looking for trouble lol


Appears to be included in font-awesome, which warns that it is trademarked: https://fontawesome.com/icons/battle-net?s=brands

Someone probably just scrolled through available icons and ended up picking that one.


This - plus the skills icons can be chosen freely from font-awesome in skill creation. Maybe we should exclude fa-brand space from the allowed icons. Another good hint, thank you! And yep, the team dearly misses some creative heads for design as everybody involved is from the coding area :)


Because you can use whatever FontAwesome (for which I'm an early backer) icon when you are creating a skill for Alice using our backend skill creator on her browser interface. We could exclude the use of fa-brands.

Edit https://www.blizzard.com/fr-fr/legal/8bcb0794-6641-4ce3-a573...


Alice offers a few different STT options. While you can stay completely offline, depending on the language and your background noise, indeed the results are not as perfect as the cloud ones. But especially coqui STT (former deepspeech) does a really good job!

Another option, if you are a bit more open with "sharing" data and you don't want to miss out the best TTS, is enabling google or azure cloud services - with the big difference, Alice will only send the sound right after detecting your hotword (only while flashing her LED and asking you "yes?" before the recording starts). Nothing else would be shared.


Does it have an option to cache the whole "Hey Alice, what's gonna be the weather at 11 tomorrow?" utterance then if the wake word is detected send what's been cached after it?

That's the main gripe I had with Mycroft, as there was no going around this:

"Hey Mycroft."

long pause

"What's gonna be the weather at 11 tomorrow?"

even longer pause

<answer>

Which is frankly so unnatural and just too annoying to be practical. And it feels like something that should be very straightforward to implement in terms of logic.


For the moment, no, nothing is cached until the hotword is recognized. We thought about it though, but it would mean we have to store passed sound input for a few seconds. While this won't be a problem for the main device, satellites aren't power full enough to run ASR them selfes (Raspi Zero), so the sound is streamed to the main device after the hotword detection. This process wouldn't match perfectly with the storing of the data.

Another thing to keep in mind is, we use intermediate results for the ASR. Means already while you are speaking, the input is parsed. Only a few ms after you go silent, the parsing is finalized and NLU/TTS will start right away.

Of course with a bit bias, I'd say it is more like: "Hey Alice" "Yes?.." "What's gonna be the weather at 11 tomorrow?" short pause (.2 seconds?) <answer>


This is entirely true and I have a few solution I could deploy with some work, the problem is a caching like this consumes power, as you literally listen all the time, as for a wakeword, to cache the audio data in memory and use it ONLY if a wakeword is detected. Now big companies do it in the cloud, we could do it locally, as an option. The path I chose to mitigate that unatural feeling is to use a human answer, bit like at home, you in the kitchen, wife or kids further away, not communicating. At some point you'd call your wife "Alice?" and you'd wait for her to reply for a "yes?" before talking as you are unaware if she's focused on you at the moment or playing with the kids whatever


I haven't looked into the details, but when listening to a wakeword, surely it has to literally listen all the time anyway?

I mean, would it really consume that much extra power to just have a second sink that's just a N-second circular buffer, so you got the samples after the wakeword ready for speech recognition when the wakeword is detected?


Yeah, that's what I said, "as for the wakewords" we listen all the time, looking for a specific wave pattern in the audio and not for words. But the audio is literally always flowing in, on all your satellites and the main unit. The problem with prewarming is that more than analysing a wave pattern in the audio stream, we need to keep a much longer audio data dump in memory in some kind of a FIFO pool. don't take me wrong, it's easily doable, just haven't taken the time to do something polished and not overconsuming for the device running it. Technically, we just need to pool the audio data, say 3-5 seconds depending on hardware used (pi 3 is slow), trim the begining of the length of the wakeword detected length and append the rest of the incoming data while already streaming to ASR, be it local oor cloud based


My guess would be no. After all if you're going through the trouble of setting up a home assistant it'll be mains powered anyway, and the Pis don't actually use that much more power when at max load than when turned off.

I think the ballpark figures for the Pi 4 are 0.5 A when doing nothing, 1A when doing something intensive a single core and 1.2A when at full multicore load.


dev here (quite overwhelmed by all the new visitors coming from here, thank you all!): My wife doesn't like female assistants either, for us Alice got a male voice and we just pretend it is Alice Cooper ;)


The local classic rock radio station in Boston used to have Alice Cooper hosting a certain night each week (I think Friday or Saturday, and presumably pre-recorded and remotely. Maybe he still does it? I haven't lived there for a while now). He was pretty funny! I'd honestly love to have a voice assistant that sounds like him; maybe the way to deal with both the question about what gender the assistant's voice should be and not having to worry about having generated pronunciations sound natural is just to pay some celebrity a bunch of money (and/or some royalties for the assistant) to spend a month in a studio recording a huge amount of stuff with varied pitch and tone and vocabulary and then that to splice together the assistant's responses. Bonus is bringing them back after a year or so to record more stuff to smooth over any rough edges that are found!


I always wondered if Alice Cooper is doing that for multiple radio stations :) A local German radio station (Radio Bob) has a weekly show with him as well. I guess the same prerecorded stuff with a few custom "you are listening to..." mixed in!

Regarding "just to pay some celebrity".. well, that is currently way out of our budget, as everything offered so far is 100% free and there are no plans to change that :)


> I always wondered if Alice Cooper is doing that for multiple radio stations :) A local German radio station (Radio Bob) has a weekly show with him as well. I guess the same prerecorded stuff with a few custom "you are listening to..." mixed in!

That's super cool! It's probably all the same across the stations, yeah, although it still is quite fun to hear the songs he picks and why. One of the things he would always do was have a pair of songs that shared some theme that he would share afterwards, and it was fun to try to guess. The only two I remember offhand were some song called "Jack the Ripper" and then "Midnight Rambler" by the Rolling Stones with the theme "about serial killers", and then "God Gave Rock and Roll To You" by Argent along with "Since You Been Gone" by Rainbow, which were apparently written by the same guitarist.

> Regarding "just to pay some celebrity".. well, that is currently way out of our budget, as everything offered so far is 100% free and there are no plans to change that :)

Oh, definitely, I wasn't offering this as a completely serious suggestion so much as something that I could totally see one of the major tech companies doing to try to separate themself from the pack


Hijacking since you're dev for a straight answer. There's mentions of multiple voice choices. Is one of them a sampling of the red queen in RE movies? I loved her dialect/voice.


> I loved her dialect/voice.

Ah yes. If ever I want an AI assistant to inform me of my imminent death, I want it to be using that voice.

"Hey Alice, call an ambulance."

"You are going to die."

"Thanks Alice."

https://www.youtube.com/watch?v=wSmYSZGMZj0


sadly no. CoquiTTS integration was in the making, where you have some voice cloning features, but I had to delay it, as it required some newer versions(python 3.9) and I think it was 64bit only.

Currently you got the following choice: Pico, Mycroft, Google Standard/Wavenet, Amazon, IBM Watson

And while using one of the cloud variants might cause security concerns, keep in mind, they will only know what to speak, not why (e.g. what your input/request was)


I tried, long ago, using ssml to mod google wavenet, but got nothing concluant


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: