Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel the need to nitpick your example.

> does the candidate want to code it up or understands that they could do: cut | sort| head

Piping together commands is undoubtedly programming, just in ancient shell script. So in a sense you're really testing for bash expertise. Which is maybe really relevant to you but I wouldn't say you're really "avoiding" coding by knowing to use those commands together, you just decided to code it with obscure shell commands. I might fail your test by choosing to write that sort of thing in python, because I could write it much more reliably in 10 minutes than deal with all the incredibly weird things that can happen in bash. I mean the final product in bash might be slightly more elegant, but your terminal history is probably littered with "man cut" and various attempts at it.

But for your example, my answer might be: if you're querying this file once, you may be querying it again, it's probably just way simpler to load the thing into sqlite instead of trying to imitate sql with some janky unix commands.

> everyone with real work experience has a story about sorting.

I am 34 and an experienced coder and I literally have no stories about sorting, and I've never once wanted to sort a CSV file on the terminal.



sqlite answer - excellent, that is exactly what I am looking for, things people did, potential solutions let me understand the candidate's real background - not the buzzwords

what is not well received is the judgmental tone, passing judgment about me for things you cannot possibly know, no need for that either, simple questions also irritate some, very important to weed those people out too,

I expect you would fail the test because of the attitude, even though if this were a real job interview you'd do your best to suppress it, it would come out


>what is not well received is the judgmental tone, passing judgment about me for things you cannot possibly know

But this is the irony---a job interview is a judgment. Why do you think feelings on this run so high?


I expect you would fail the test because of the attitude

How is this relevant? Now you're just taking cheap shots.


I don't get it, honestly.

I would not recommend a candidate, who, when asked if they could do this with cut|sort|head would reply something like:

heh, what a pathetic question, I bet your history is full of "man cut"

it is not the right answer, it is needlessly obnoxious and indicates a person that can barely bottle up their emotions and quickly gives in under pressure. Usually not a good match to any team - unless they also bring in something massively beneficial.


Culture fit. Is the candidate likely to reject/scoff at certain tasks because they think it's below them?


"it appears you're implicitly looking for bash knowledge, which is unfair".

"I wouldn't hire you with that attitude"

I'm getting more judgemental vibes from interviewer than interviewee.


I do believe overgard was just making a point and not expecting a job offer from you. I find these "I'd never hire you" comments so obnoxious... At least you could tell us where you work and your name so we could ask for another interviewer if we ever apply there :-)


Whew, My first thought was import the thing to database, use database engine to sort.


Dear God, I hope you don't do interviews. Talking about the "attitude" of someone you don't know, taking criticism personally, thinking that you are weeding people out by trying to "irritate some", suggesting they are trying to suppress it, and (of course) that you will be detect it. Every thread on hiring about HN has these weird passive-aggressive interviewers...interviewing is hard, it is really something that people should be trained to do (and some people really won't ever be able to do it).


I was with you until you called unix tools janky.. how dare you! semi joking :]

there is some simple elegance and power to these old C tools


> I am 34 and an experienced coder and I literally have no stories about sorting, and I've never once wanted to sort a CSV file on the terminal.

I'm a bit younger, but have done this dozens and dozens of times.

----

A lot of one off processes are way easier to handle with a bunch of terminal commands and pipes.


Only if you do them often enough that you don’t forget the flags each time.


Dude, do you even data science?

Not knowing Unix tools like cut and sort is a hard fail on a senior individual contributor in data science role, as is using sqlite which totally doesn't scale the way sort and cut does. Separates sheep from goats in data science land. You should really learn them if you're in the field and work with reasonably big data sets. Frankly you should learn them if you work with data at all, ever.

I've literally seen FAANGs recommendation engines powered with these tools running nightly on someone's desktop.


Well, they do work, but streaming processing in Python is not difficult and it’s much easier to have graceful failure processing and self-documenting code. Not to say of extending the logic if it would ever be needed.

But maybe that’s more about separating sheep from shepherds.


I’m not sure if I should be horrified or not. Both by the fact this happens, and by the fact that you seem proud of it.

Learning sort and cut takes literally takes all of 10 minutes, so if it makes you pass over an otherwise qualified candidate you have your priorities completely backwards.


Feel free to be horrified: a data scientist who doesn't understand where and why to use unix command line tools for data preparation and ETL is about as useful to me as one who doesn't understand the conditions where a t-test breaks down or what a ROC curve is.

Generally speaking, people like this have never actually dealt with large data sets, never dealt with issues involved with installing "unapproved software" on a machine (ridiculously common in The Real World), has probably never cleaned a dirty data set (what do you do when your giant csv is formatted in a way that Wes McKinney didn't think of?), and will in a senior role be a long term liability for a data science team that works on serious problems. Sure at one point I didn't know about them either: I wasn't a senior data scientist then. I submit that if you don't know about them and haven't actually used them, you aren't either.


I think that the people not being impressed by cut and sort are approaching this from the Linux end of things, where those tools are nothing special at all. I guess we kind of expected that the data science wizards would be using fancier tools.


Yeah, well, people who have enough self regard to think of themselves as "wizards" are super unlikely to be able to actually do the day to day grind of getting, cleaning and preparing data for feature generation, which is about 95% of the job.

Another good weeder for a person claiming to be senior: discuss how you would fix the performance of the default R naive Bayes implementation in e1071. It's numerically more or less correct, but written by deranged ape-men who don't understand how computers work (a problem in a lot of the R ecosystem; in the Python ecosystem, the problem is nobody has yet written algorithms for X, which ends up being a very similar problem: aka it's your job to code up sane algorithms).


So I think this is a perfect example of what happens in tech interviews.

OP is using knowledge of a specific technology as a heuristic for "has experience in role x"

But this always makes me wonder, couldn't you see that experience from a resume? If the candidate filled a data science role at somewhere reputable for 3 years, and you verify that they successfully filled that role, why rely on that heuristic?

As you say testing for the specific technology, when it can be learnt in 10 minutes, does not seem logical.


Don't worry, he responded with this very data-driven explanation:

> Generally speaking, people like this have never actually dealt with large data sets


Hmm, tell us more? Databases are generally known for their good performance (under locking conditions - this is the relevant factor I guess?), after all...


> Databases are generally known for their good performance

If the data is on a filesystem, then sed, grep and cut pipelines will likely be your fastest option (Yahoo! processed petabytes of logfiles for decades that way.)

If the data is already inside a database table and indexed well, that could be fast enough. But generally speaking, the ETL is often a bottleneck. And DBAs are $$$$ compared to "the UNIX way."

Source: DBA




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: