NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Show HN: Visualize how HN/Reddit talk about your company and products (brandimage.io)
ken 1575 days ago [-]
I was at a company that tried "sentiment analysis" 10 or 15 years ago. It was impossible to get right. The results were useless. It's funny how nothing seems to have changed. There's always some executive who's heard of "sentiment" and thinks they can spend a week shoving a bunch of text into a database, and then have it tell you whether the words are perceived as good or bad.

- I typed "Chevrolet" and it says "mpg" and "venerable" have negative sentiment, while "underachieving", "supposedly", "kilometers", and "cars" have positive sentiment. "Stunning" and "awkward" are both neutral.

- I tried "Mazda" and it was only slightly better. "Praise" is neutral, "regret" is positive, and "touring" is negative.

- I tried "Petzl" and only shows 16 words, all neutral, and that includes stop words like "had" and "and" and "etc".

Those are at least companies with unique names. For companies with less common names, you might be lucky to get keywords related to the right company at all.

I think there can be good uses for word clouds, but they are few and far between, and this isn't one. Just make 3 lists, side by side, and title them "positive", "neutral", and "negative". Instead of font size, put the more common words higher on their respective list. The only reason I can see to use a word cloud here is to hide how bad the analysis is.

jonathanbgn 1574 days ago [-]
Let me elaborate a bit more on how the app computes sentiment. For a particular word, its sentiment is the average of the sentences' sentiments which contain both the word and the brandname (in order to identify the sentiment targeted at the brand, not just the overall sentiment).

For example, in the case of Mazda where you say that "regret" is classified as positive, if you look into which message it comes from you can see the original sentence: "Buy a Mazda, you won't regret it :)"

I agree with you that the word cloud is not useful on its own, and this is why you can click on a word to see the actual messages. Think of the word cloud as merely an entry point into a more detailed analysis by a human.

Thanks for the feedback.

swalsh 1575 days ago [-]
This is a really cool idea, there are other products out there that do the same thing, but there is definitely room to differentiate yourself.

One interesting issue, you might need to read the context a bit more to gauge if the post is actually talking about the Brand. For example Target is a common english word, and if you look at the results, only 1 usage was actually referring to the store Target.

One nice bonus that would be useful, I'd like to be able to compare sentiment by Subreddit, if i'm marketing on Reddit I would not target "Reddit" as a whole.

jonathanbgn 1575 days ago [-]
Thanks for the feedback! It's a great idea, I'll definitely implement some Named Entity Recognition or other contextual model in the future to be able to distinguish from common words.

As for the subreddit, it's already on my next features list :)

moralestapia 1575 days ago [-]
Good idea in principle, however, did you even try to run a test before releasing it to the public?

For a particular term I get the following words having "positive" sentiments: boycott, punishment, greed, backlash, protesters, damaging, upset.

How in hell is that positive?

jonathanbgn 1575 days ago [-]
That's a fair criticism. Sentiment analysis is quite hard to get right on social media messages because of diversity, subtlety, and many other aspects. From my experience with similar commercial and (very!) expensive products, their accuracy is far from perfect too.

Also consider the lack of labeled data for HN and Reddit messages: I had to use Twitter messages to train the classifiers.

This is the reason why I tried to play with BERT to see if I could get a model to generalize well from only Twitter messages. From my experiments, if you activate BERT (which makes the app much slower), you should be able to get 60~70% accuracy.

It's not perfect, but not too bad as well if you are getting averages over a large amount of messages.

Overall it's still a work in progress, I expect to greatly improve the accuracy over the following weeks!

chimi 1575 days ago [-]
I came here to say the same thing as the GP. I don't understand why some words are red or green.

For example, you can type in non-brand words as well. I typed in "houses" and the word "homeless" came up in green!

With a brand, facebook, I got this word "amiriteguyze" in red and clicking on it

Negative 11/19/2019, 12:13:31 PM

facebook is bad amiriteguyze?!?!?!?

Why is that even a word that would show up in the word cloud? I can't imagine it was entered a bunch of times. I can't intuit any correlation between the colors, sizes, or words themselves that show up in the clouds.

jonathanbgn 1574 days ago [-]
The algorithm will try to give more importance to words which appear rarely and are only used with the chosen brandname (similar to TF-IDF). This is why sometimes weird words can surface to the wordcloud, especially when the sample size of messages is small.

To prevent those words from appearing, I was thinking to implement some dictionary-check to only allow for meaningful words. However this approach also have drawback as you restrict people's words and can miss important new concepts.

Thanks for the feedback.

BubRoss 1575 days ago [-]
To be clear you made something that doesn't work, posted it and got attention because you asserted that it worked, and when people point out it doesn't work, you say 'it is hard and other people's software also doesn't work'.
jonathanbgn 1574 days ago [-]
This is not what I said. I said that the accuracy is not 100% perfect, but that you can improve it by turning on BERT in the menu bar.
BubRoss 1574 days ago [-]
Everyone else said it for you
screaminghawk 1575 days ago [-]
Equally I saw "dumb" as positive. In context, it was negative but the whole post was positive.
inamesh 1575 days ago [-]
I guess it's one of those things where hate and love are good but mediocre is bad
vecplane 1575 days ago [-]
Getting an error on https://brandimage.io/

Over Quota

This application is temporarily over its serving quota. Please try again later.

jonathanbgn 1575 days ago [-]
Thanks for letting me know, I just increased the quota from the hosting service. The website is online again now!
lganzzzo 1572 days ago [-]
Cool stuff!

I found an issue with "+" symbols in the brand name. Check out this URL - https://brandimage.io/insight/c++?source=hn

If "+" symbol is present, the UI won't show anything.

kmcquade 1575 days ago [-]
Pretty awesome that it’s open source.

For large companies there are some proprietary solutions for this. Example:

https://www.trendkite.com/

Disclaimer: family member works there so that’s the reason I’m aware that this niche exists.

criddell 1575 days ago [-]
I kind of hate this idea.

It's one thing if companies want to find out what people are saying about them so they can improve. However, I'd be surprised if that's how it is used.

This feels more like a way to measure the effectiveness of their astroturfing or low-key marketing efforts.

RemingtonLak 1574 days ago [-]
Good job. Certainly see some potential. Our company used Clara in the past. I actually wrote my own in xls if you can believe it but it was highly vertical and tight topic so it was relatively easy. Where are you physically located?
jonathanbgn 1574 days ago [-]
Thanks for your message, like you said it's a real challenge when the domain is quite broad (such as social media). I'm currently based in Taipei.
guillaumec 1575 days ago [-]
Interesting project, and I like the simple UI. How does the Sentiment Analysis is done? What kind of back-end have you used?
jonathanbgn 1575 days ago [-]
Thanks! The basic (default) version for the sentiment analysis is based on TextBlob library, but you can choose to activate deep learning to analyze sentiment with Google AI's BERT (trained on Twitter messages), though it is quite slow at the moment because inferences are made on a CPU and not a GPU.

The back-end is just Python/Flask and I use the free Algolia and Pushshift.io APIs to source the messages from HN and Reddit (big thanks to them!)

uberneo 1574 days ago [-]
This looks really great with such a simple UI.last year I tried to do a realtime sentiment analysis on twitter messages using TextBlob.It was fast but not very accurate. Can you suggest any other library which might works fast enough on realtime messages.
jonathanbgn 1573 days ago [-]
For inference speed I recommend a Naive Bayes model. I've tried this on Twitter messages and got near ~90% accuracy with 3-class (positive, negative, neutral).

The easiest library to do that would probably be scikit-learn with their ComplementNB class: https://scikit-learn.org/stable/modules/generated/sklearn.na...

For the data you can use the SemEval 2017 Task4-A dataset (around ~10K labeled tweets): https://github.com/cbaziotis/datastories-semeval2017-task4/t...

guillaumec 1575 days ago [-]
Thanks for the answer!
blader_johny 1575 days ago [-]
We made something similar at a hackathon: https://spicy-lip.surge.sh/features
mk74160 1575 days ago [-]
your link is dead
kory 1575 days ago [-]
Interesting concept, but the models powering it need a lot of work. For example, searching "Apple" brings up "cider".
detaro 1575 days ago [-]
negative: "employer", "apps", "perf", "idioms", "culture", "humanity"

positive: "closes", "binary", "btw", "disclaimer", "bullshit"

The word cloud seems fairly useless.

JamesQuigley 1575 days ago [-]
Getting a "503 Over Quota" when trying to access your site :(
sali0 1575 days ago [-]
This is fantastic, good work. Love the idea.
mk74160 1575 days ago [-]
Very cool.
jonathanbgn 1575 days ago [-]
Thanks!
pressurefree 1575 days ago [-]
https://brandimage.io/insight/https://pixabay.com/ didnt work! remove the https i copied and pasted from address bar

https://brandimage.io/insight/pixabay.com/

seems off wrong words, but good idea yes.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 15:21:12 GMT+0000 (Coordinated Universal Time) with Vercel.