Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Haha that's OK, I can rant or ramble on PS for ages ;)

For the cause of the slowness, try a Get-Content file.txt | format-list * -Force on a single line file and have a look at what comes out; you'd think it would be a System.String with only a Length property but surprise, there is a lot of metadata about which file it came from, and which provider the file was on. A small addition to one string, but it adds up for tens of thousands of lines. That is the big reason Get-Content is so slow. PowerShell has Usermode filesystem / virtual file equivalents called PSProviders, so there is some overhead for starting up a cmdlet at all and then binding its parameters and then dropping through to the FileSystem provider, but the added string properties are the big overhead for file reading.

They should allow you to do convenient things further down the pipeline, e.g. filter the strings, then move the files which contained the strings you have left, although I'm not aware of any genuinely useful use case. I expect there is one somewhere. This is baked in from v1, so removing that would be a breaking change on a very heavily used cmdlet, so it won't stop happening.

The variations like Get-Content -Raw were the intended workaround, by outputting a single string, you skip much of the overhead. The other approaches, ${} and .Net methods, avoid get-content altogether. ${} is the syntax for a full variable name, and PowerShell has PSProviders which are a little like virtual drives, usermode filesystems, or /proc and so you can ${env:computername} for an environment variable, ${variable:pwd} for another way to get the working directory, and ${c:\path\to\file.txt} fits that format so they made it do something useful.

> Does all the powershelly parsing and dynamicism really hurt that bad?

The language is interpreted, but loops which run enough times get compiled by the Dynamic Language Runtime (DLR) parts of .Net, swapped in while the code is running, and run even faster. Parts which are a thin layer over .Net like @{} is a System.Collections.Hashtable and the Regex engine is just the .Net regex engine with case insensitivity switched on, they run pretty fast.

Function calls are killer slow. Don't use good programming practises and make lots of small dedicated functions and call them in a loop, for speed inline everything. Because functions can "look like" cmdlets with Get-Name -Param1 x -Param2 y, there is a HUGE overhead to parameter binding. You might have configured things so -Param2 gets the values from the metadata on the strings from Get-Content, there's a lot of searching to find the best match in the parameter handling.

The pipeline overhead is quite big, code like Get-Thing | Where { .. } | Set-Thing runs pretty slow compared to a rewrite without any pipes or with fewer cmdlets. (Get-Thing).Where{} is faster, and $result = foreach($x in Get-Thing) { if (..) { $x } } even faster.

The slow bits IMO are where PowerShell does more work behind the scenes to try and add shell convenience rather than programming convenience; the great bit is "it's an object shell", the downside of that is "making objects has costs". Compare SQL style code "select x,y from data" describes the output first, then the database engine can skip the data you don't want. Shells and PowerShell have that backwards, you "get-data | select x,y" and the engine has no way around generating all the data first only to throw it away later. Since PS tries for convenience, its cmdlets err on the side of generating more data rather than less, which makes this effect worse.

I don't want to end up writing C# for everything either :)



Great reply! This has helped me tremendously!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: