Help!! Regex parsing a custom string syntax in PHP

Posted in Help the coder! on Feb 15, 2009 at 18:51 IST (about 1 year ago). Subscribe to this post Bookmark and Share Email
Showing comments 1 to 4 of total 4 on page 1 of 1
Post reply
« Previous1Next »

meenus
Rank: 11

This is more of a generic regex question than a PHP-specific one.

I am given different strings that may look like:

A/B/PA ID U/C/D

And I'm trying to extract the segment in the middle slashes that has spaces ("/PA ID U") using:

preg_match('/(\/PA .+)(\/.+|$)/', '', $matches);

However, instead of getting "/PA ID U" as I was expecting, I was getting "/PA ID U/C/D".

 

How can I make it prioritize matching "/.+" over "$" in that last group?

 

I need that last group to match either another "/somethingsomthing" or "" because the string varies a lot. If I only match for the "/.+", I won't be able to get the "/PA ID U" if it's at the end of the line, such as in "A/B/PA ID U".

Basically, I need to be able to extract specific segments like so:

Given: "A/B/PA ID U/PA ID U/C/D"
Extract: (A), (B), (PA ID U), (PA ID U), (C), (D)

 

I'm trying to avoid using split() or explode() because that would mean that I have to match the "PA ID U" pattern separately. Aside from merely extracting the slash-separated segments, I need to validate that the substrings match specific patterns.

Posted by meenus on Sunday, February 15, 2009, 6:51 pm
  • Currently 0.00/5

0 votes

Thank this userFlag this comment

bootrecord
Rank: 371

I think you can most effectively use split to accomplish what you want.

split('/',$string);

See: php manual

Posted by bootrecord on Friday, February 20, 2009, 3:27 am
  • Currently 0.00/5

0 votes

Thank this userFlag this comment

lavan
Rank: 470

necramirez,

(\w+\s?)+

should work

Posted by lavan on Friday, February 20, 2009, 10:45 pm
  • Currently 0.00/5

0 votes

Thank this userFlag this comment

cheetah
Rank: 253

Your regular expression is not working because the .+ is being greedy. You could fix it by adding a non-greedy modifier (a ?) to your first .+ as such:

preg_match('/(\/PA .+?)(\/.+|$)/', '', $matches);

You could alternatively do:

'/\/(PA [^\/]+)(\/.+|$)/'

I moved the slash outside of the parens to avoid capturing that (I presume you're not interested in the slash). The [^\/]+ will capture any character up to the next slash.

Posted by cheetah on Saturday, February 21, 2009, 9:48 pm
  • Currently 0.00/5

0 votes

Thank this userFlag this comment
Pages: « Previous1Next »

Post your comment (No registration required)

  Add my comment  

TechieDesi Community

Not signed in (Sign-in or Register)
Be a true TechieDesi!
Top 10 Users
Spread the word
Invite your friends
Fan stuff
Help us improve
Need Help
FAQ's
Search tips
Found a bug? Report!
Feeds and letters
Subscribe via RSS
Archives
Subscribe to newsletter
Unsubscribe e-mail
Miscellaneous
Privacy policy
Visit rootnerve
About us
About us
Support the development
Official Blog
Advertise with us
Careers
Copyright (c) 2008, TechieDesi.com. All rights reserved | About us | Do-Not-Disturb registry | Powered by rootnerve | Page rendered in 0.176 seconds