Class | HTML::Selector |
In: |
vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb
|
Parent: | Object |
Selects HTML elements using CSS 2 selectors.
The Selector class uses CSS selector expressions to match and select HTML elements.
For example:
selector = HTML::Selector.new "form.login[action=/login]"
creates a new selector that matches any form element with the class login and an attribute action with the value /login.
Use the match method to determine if an element matches the selector.
For simple selectors, the method returns an array with that element, or nil if the element does not match. For complex selectors (see below) the method returns an array with all matched elements, of nil if no match found.
For example:
if selector.match(element) puts "Element is a login form" end
Use the select method to select all matching elements starting with one element and going through all children in depth-first order.
This method returns an array of all matching elements, an empty array if no match is found
For example:
selector = HTML::Selector.new "input[type=text]" matches = selector.select(element) matches.each do |match| puts "Found text field with name #{match.attributes['name']}" end
Selectors can match elements using any of the following criteria:
When using a combination of the above, the element name comes first followed by identifier, class names, attributes, pseudo classes and negation in any order. Do not seprate these parts with spaces! Space separation is used for descendant selectors.
For example:
selector = HTML::Selector.new "form.login[action=/login]"
The matched element must be of type form and have the class login. It may have other classes, but the class login is required to match. It must also have an attribute called action with the value /login.
This selector will match the following element:
<form class="login form" method="post" action="/login">
but will not match the element:
<form method="post" action="/logout">
Several operators are supported for matching attributes:
For example, the following two selectors match the same element:
#my_id [id=my_id]
and so do the following two selectors:
.my_class [class~=my_class]
Complex selectors use a combination of expressions to match elements:
Since children and sibling selectors may match more than one element given the first element, the match method may return more than one match.
Pseudo classes were introduced in CSS 3. They are most often used to select elements in a given position:
As you can see, <tt>:nth-child<tt> pseudo class and its varient can get quite tricky and the CSS specification doesn’t do a much better job explaining it. But after reading the examples and trying a few combinations, it’s easy to figure out.
For example:
table tr:nth-child(odd)
Selects every second row in the table starting with the first one.
div p:nth-child(4)
Selects the fourth paragraph in the div, but not if the div contains other elements, since those are also counted.
div p:nth-of-type(4)
Selects the fourth paragraph in the div, counting only paragraphs, and ignoring all other elements.
div p:nth-of-type(-n+4)
Selects the first four paragraphs, ignoring all others.
And you can always select an element that matches one set of rules but not another using :not. For example:
p:not(.post)
Matches all paragraphs that do not have the class .post.
You can use substitution with identifiers, class names and element values. A substitution takes the form of a question mark (?) and uses the next value in the argument list following the CSS expression.
The substitution value may be a string or a regular expression. All other values are converted to strings.
For example:
selector = HTML::Selector.new "#?", /^\d+$/
matches any element whose identifier consists of one or more digits.
Creates a new selector for the given class name.
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 213 213: def for_class(cls) 214: self.new([".?", cls]) 215: end
Creates a new selector for the given id.
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 222 222: def for_id(id) 223: self.new(["#?", id]) 224: end
Creates a new selector from a CSS 2 selector expression.
The first argument is the selector expression. All other arguments are used for value substitution.
Throws InvalidSelectorError is the selector expression is invalid.
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 238 238: def initialize(selector, *values) 239: raise ArgumentError, "CSS expression cannot be empty" if selector.empty? 240: @source = "" 241: values = values[0] if values.size == 1 && values[0].is_a?(Array) 242: # We need a copy to determine if we failed to parse, and also 243: # preserve the original pass by-ref statement. 244: statement = selector.strip.dup 245: # Create a simple selector, along with negation. 246: simple_selector(statement, values).each { |name, value| instance_variable_set("@#{name}", value) } 247: 248: # Alternative selector. 249: if statement.sub!(/^\s*,\s*/, "") 250: second = Selector.new(statement, values) 251: (@alternates ||= []) << second 252: # If there are alternate selectors, we group them in the top selector. 253: if alternates = second.instance_variable_get(:@alternates) 254: second.instance_variable_set(:@alternates, nil) 255: @alternates.concat alternates 256: end 257: @source << " , " << second.to_s 258: # Sibling selector: create a dependency into second selector that will 259: # match element immediately following this one. 260: elsif statement.sub!(/^\s*\+\s*/, "") 261: second = next_selector(statement, values) 262: @depends = lambda do |element, first| 263: if element = next_element(element) 264: second.match(element, first) 265: end 266: end 267: @source << " + " << second.to_s 268: # Adjacent selector: create a dependency into second selector that will 269: # match all elements following this one. 270: elsif statement.sub!(/^\s*~\s*/, "") 271: second = next_selector(statement, values) 272: @depends = lambda do |element, first| 273: matches = [] 274: while element = next_element(element) 275: if subset = second.match(element, first) 276: if first && !subset.empty? 277: matches << subset.first 278: break 279: else 280: matches.concat subset 281: end 282: end 283: end 284: matches.empty? ? nil : matches 285: end 286: @source << " ~ " << second.to_s 287: # Child selector: create a dependency into second selector that will 288: # match a child element of this one. 289: elsif statement.sub!(/^\s*>\s*/, "") 290: second = next_selector(statement, values) 291: @depends = lambda do |element, first| 292: matches = [] 293: element.children.each do |child| 294: if child.tag? && subset = second.match(child, first) 295: if first && !subset.empty? 296: matches << subset.first 297: break 298: else 299: matches.concat subset 300: end 301: end 302: end 303: matches.empty? ? nil : matches 304: end 305: @source << " > " << second.to_s 306: # Descendant selector: create a dependency into second selector that 307: # will match all descendant elements of this one. Note, 308: elsif statement =~ /^\s+\S+/ && statement != selector 309: second = next_selector(statement, values) 310: @depends = lambda do |element, first| 311: matches = [] 312: stack = element.children.reverse 313: while node = stack.pop 314: next unless node.tag? 315: if subset = second.match(node, first) 316: if first && !subset.empty? 317: matches << subset.first 318: break 319: else 320: matches.concat subset 321: end 322: elsif children = node.children 323: stack.concat children.reverse 324: end 325: end 326: matches.empty? ? nil : matches 327: end 328: @source << " " << second.to_s 329: else 330: # The last selector is where we check that we parsed 331: # all the parts. 332: unless statement.empty? || statement.strip.empty? 333: raise ArgumentError, "Invalid selector: #{statement}" 334: end 335: end 336: end
Matches an element against the selector.
For a simple selector this method returns an array with the element if the element matches, nil otherwise.
For a complex selector (sibling and descendant) this method returns an array with all matching elements, nil if no match is found.
Use +first_only=true+ if you are only interested in the first element.
For example:
if selector.match(element) puts "Element is a login form" end
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 357 357: def match(element, first_only = false) 358: # Match element if no element name or element name same as element name 359: if matched = (!@tag_name || @tag_name == element.name) 360: # No match if one of the attribute matches failed 361: for attr in @attributes 362: if element.attributes[attr[0]] !~ attr[1] 363: matched = false 364: break 365: end 366: end 367: end 368: 369: # Pseudo class matches (nth-child, empty, etc). 370: if matched 371: for pseudo in @pseudo 372: unless pseudo.call(element) 373: matched = false 374: break 375: end 376: end 377: end 378: 379: # Negation. Same rules as above, but we fail if a match is made. 380: if matched && @negation 381: for negation in @negation 382: if negation[:tag_name] == element.name 383: matched = false 384: else 385: for attr in negation[:attributes] 386: if element.attributes[attr[0]] =~ attr[1] 387: matched = false 388: break 389: end 390: end 391: end 392: if matched 393: for pseudo in negation[:pseudo] 394: if pseudo.call(element) 395: matched = false 396: break 397: end 398: end 399: end 400: break unless matched 401: end 402: end 403: 404: # If element matched but depends on another element (child, 405: # sibling, etc), apply the dependent matches instead. 406: if matched && @depends 407: matches = @depends.call(element, first_only) 408: else 409: matches = matched ? [element] : nil 410: end 411: 412: # If this selector is part of the group, try all the alternative 413: # selectors (unless first_only). 414: if @alternates && (!first_only || !matches) 415: @alternates.each do |alternate| 416: break if matches && first_only 417: if subset = alternate.match(element, first_only) 418: if matches 419: matches.concat subset 420: else 421: matches = subset 422: end 423: end 424: end 425: end 426: 427: matches 428: end
Return the next element after this one. Skips sibling text nodes.
With the name argument, returns the next element with that name, skipping other sibling elements.
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 487 487: def next_element(element, name = nil) 488: if siblings = element.parent.children 489: found = false 490: siblings.each do |node| 491: if node.equal?(element) 492: found = true 493: elsif found && node.tag? 494: return node if (name.nil? || node.name == name) 495: end 496: end 497: end 498: nil 499: end
Selects and returns an array with all matching elements, beginning with one node and traversing through all children depth-first. Returns an empty array if no match is found.
The root node may be any element in the document, or the document itself.
For example:
selector = HTML::Selector.new "input[type=text]" matches = selector.select(element) matches.each do |match| puts "Found text field with name #{match.attributes['name']}" end
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 447 447: def select(root) 448: matches = [] 449: stack = [root] 450: while node = stack.pop 451: if node.tag? && subset = match(node, false) 452: subset.each do |match| 453: matches << match unless matches.any? { |item| item.equal?(match) } 454: end 455: elsif children = node.children 456: stack.concat children.reverse 457: end 458: end 459: matches 460: end
Similar to select but returns the first matching element. Returns nil if no element matches the selector.
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 465 465: def select_first(root) 466: stack = [root] 467: while node = stack.pop 468: if node.tag? && subset = match(node, true) 469: return subset.first if !subset.empty? 470: elsif children = node.children 471: stack.concat children.reverse 472: end 473: end 474: nil 475: end
Create a regular expression to match an attribute value based on the equality operator (=, ^=, |=, etc).
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 681 681: def attribute_match(equality, value) 682: regexp = value.is_a?(Regexp) ? value : Regexp.escape(value.to_s) 683: case equality 684: when "=" then 685: # Match the attribute value in full 686: Regexp.new("^#{regexp}$") 687: when "~=" then 688: # Match a space-separated word within the attribute value 689: Regexp.new("(^|\s)#{regexp}($|\s)") 690: when "^=" 691: # Match the beginning of the attribute value 692: Regexp.new("^#{regexp}") 693: when "$=" 694: # Match the end of the attribute value 695: Regexp.new("#{regexp}$") 696: when "*=" 697: # Match substring of the attribute value 698: regexp.is_a?(Regexp) ? regexp : Regexp.new(regexp) 699: when "|=" then 700: # Match the first space-separated item of the attribute value 701: Regexp.new("^#{regexp}($|\s)") 702: else 703: raise InvalidSelectorError, "Invalid operation/value" unless value.empty? 704: # Match all attributes values (existence check) 705: // 706: end 707: end
Called to create a dependent selector (sibling, descendant, etc). Passes the remainder of the statement that will be reduced to zero eventually, and array of substitution values.
This method is called from four places, so it helps to put it here for resue. The only logic deals with the need to detect comma separators (alternate) and apply them to the selector group of the top selector.
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 794 794: def next_selector(statement, values) 795: second = Selector.new(statement, values) 796: # If there are alternate selectors, we group them in the top selector. 797: if alternates = second.instance_variable_get(:@alternates) 798: second.instance_variable_set(:@alternates, nil) 799: (@alternates ||= []).concat alternates 800: end 801: second 802: end
Returns a lambda that can match an element against the nth-child pseudo class, given the following arguments:
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 716 716: def nth_child(a, b, of_type, reverse) 717: # a = 0 means select at index b, if b = 0 nothing selected 718: return lambda { |element| false } if a == 0 && b == 0 719: # a < 0 and b < 0 will never match against an index 720: return lambda { |element| false } if a < 0 && b < 0 721: b = a + b + 1 if b < 0 # b < 0 just picks last element from each group 722: b -= 1 unless b == 0 # b == 0 is same as b == 1, otherwise zero based 723: lambda do |element| 724: # Element must be inside parent element. 725: return false unless element.parent && element.parent.tag? 726: index = 0 727: # Get siblings, reverse if counting from last. 728: siblings = element.parent.children 729: siblings = siblings.reverse if reverse 730: # Match element name if of-type, otherwise ignore name. 731: name = of_type ? element.name : nil 732: found = false 733: for child in siblings 734: # Skip text nodes/comments. 735: if child.tag? && (name == nil || child.name == name) 736: if a == 0 737: # Shortcut when a == 0 no need to go past count 738: if index == b 739: found = child.equal?(element) 740: break 741: end 742: elsif a < 0 743: # Only look for first b elements 744: break if index > b 745: if child.equal?(element) 746: found = (index % a) == 0 747: break 748: end 749: else 750: # Otherwise, break if child found and count == an+b 751: if child.equal?(element) 752: found = (index % a) == b 753: break 754: end 755: end 756: index += 1 757: end 758: end 759: found 760: end 761: end
Creates a only child lambda. Pass +of-type+ to only look at elements of its type.
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 766 766: def only_child(of_type) 767: lambda do |element| 768: # Element must be inside parent element. 769: return false unless element.parent && element.parent.tag? 770: name = of_type ? element.name : nil 771: other = false 772: for child in element.parent.children 773: # Skip text nodes/comments. 774: if child.tag? && (name == nil || child.name == name) 775: unless child.equal?(element) 776: other = true 777: break 778: end 779: end 780: end 781: !other 782: end 783: end
Creates a simple selector given the statement and array of substitution values.
Returns a hash with the values tag_name, attributes, pseudo (classes) and negation.
Called the first time with can_negate true to allow negation. Called a second time with false since negation cannot be negated.
# File vendor/rails/actionpack/lib/action_controller/vendor/html-scanner/html/selector.rb, line 514 514: def simple_selector(statement, values, can_negate = true) 515: tag_name = nil 516: attributes = [] 517: pseudo = [] 518: negation = [] 519: 520: # Element name. (Note that in negation, this can come at 521: # any order, but for simplicity we allow if only first). 522: statement.sub!(/^(\*|[[:alpha:]][\w\-]*)/) do |match| 523: match.strip! 524: tag_name = match.downcase unless match == "*" 525: @source << match 526: "" # Remove 527: end 528: 529: # Get identifier, class, attribute name, pseudo or negation. 530: while true 531: # Element identifier. 532: next if statement.sub!(/^#(\?|[\w\-]+)/) do |match| 533: id = $1 534: if id == "?" 535: id = values.shift 536: end 537: @source << "##{id}" 538: id = Regexp.new("^#{Regexp.escape(id.to_s)}$") unless id.is_a?(Regexp) 539: attributes << ["id", id] 540: "" # Remove 541: end 542: 543: # Class name. 544: next if statement.sub!(/^\.([\w\-]+)/) do |match| 545: class_name = $1 546: @source << ".#{class_name}" 547: class_name = Regexp.new("(^|\s)#{Regexp.escape(class_name)}($|\s)") unless class_name.is_a?(Regexp) 548: attributes << ["class", class_name] 549: "" # Remove 550: end 551: 552: # Attribute value. 553: next if statement.sub!(/^\[\s*([[:alpha:]][\w\-]*)\s*((?:[~|^$*])?=)?\s*('[^']*'|"[^*]"|[^\]]*)\s*\]/) do |match| 554: name, equality, value = $1, $2, $3 555: if value == "?" 556: value = values.shift 557: else 558: # Handle single and double quotes. 559: value.strip! 560: if (value[0] == ?" || value[0] == ?') && value[0] == value[-1] 561: value = value[1..-2] 562: end 563: end 564: @source << "[#{name}#{equality}'#{value}']" 565: attributes << [name.downcase.strip, attribute_match(equality, value)] 566: "" # Remove 567: end 568: 569: # Root element only. 570: next if statement.sub!(/^:root/) do |match| 571: pseudo << lambda do |element| 572: element.parent.nil? || !element.parent.tag? 573: end 574: @source << ":root" 575: "" # Remove 576: end 577: 578: # Nth-child including last and of-type. 579: next if statement.sub!(/^:nth-(last-)?(child|of-type)\((odd|even|(\d+|\?)|(-?\d*|\?)?n([+\-]\d+|\?)?)\)/) do |match| 580: reverse = $1 == "last-" 581: of_type = $2 == "of-type" 582: @source << ":nth-#{$1}#{$2}(" 583: case $3 584: when "odd" 585: pseudo << nth_child(2, 1, of_type, reverse) 586: @source << "odd)" 587: when "even" 588: pseudo << nth_child(2, 2, of_type, reverse) 589: @source << "even)" 590: when /^(\d+|\?)$/ # b only 591: b = ($1 == "?" ? values.shift : $1).to_i 592: pseudo << nth_child(0, b, of_type, reverse) 593: @source << "#{b})" 594: when /^(-?\d*|\?)?n([+\-]\d+|\?)?$/ 595: a = ($1 == "?" ? values.shift : 596: $1 == "" ? 1 : $1 == "-" ? -1 : $1).to_i 597: b = ($2 == "?" ? values.shift : $2).to_i 598: pseudo << nth_child(a, b, of_type, reverse) 599: @source << (b >= 0 ? "#{a}n+#{b})" : "#{a}n#{b})") 600: else 601: raise ArgumentError, "Invalid nth-child #{match}" 602: end 603: "" # Remove 604: end 605: # First/last child (of type). 606: next if statement.sub!(/^:(first|last)-(child|of-type)/) do |match| 607: reverse = $1 == "last" 608: of_type = $2 == "of-type" 609: pseudo << nth_child(0, 1, of_type, reverse) 610: @source << ":#{$1}-#{$2}" 611: "" # Remove 612: end 613: # Only child (of type). 614: next if statement.sub!(/^:only-(child|of-type)/) do |match| 615: of_type = $1 == "of-type" 616: pseudo << only_child(of_type) 617: @source << ":only-#{$1}" 618: "" # Remove 619: end 620: 621: # Empty: no child elements or meaningful content (whitespaces 622: # are ignored). 623: next if statement.sub!(/^:empty/) do |match| 624: pseudo << lambda do |element| 625: empty = true 626: for child in element.children 627: if child.tag? || !child.content.strip.empty? 628: empty = false 629: break 630: end 631: end 632: empty 633: end 634: @source << ":empty" 635: "" # Remove 636: end 637: # Content: match the text content of the element, stripping 638: # leading and trailing spaces. 639: next if statement.sub!(/^:content\(\s*(\?|'[^']*'|"[^"]*"|[^)]*)\s*\)/) do |match| 640: content = $1 641: if content == "?" 642: content = values.shift 643: elsif (content[0] == ?" || content[0] == ?') && content[0] == content[-1] 644: content = content[1..-2] 645: end 646: @source << ":content('#{content}')" 647: content = Regexp.new("^#{Regexp.escape(content.to_s)}$") unless content.is_a?(Regexp) 648: pseudo << lambda do |element| 649: text = "" 650: for child in element.children 651: unless child.tag? 652: text << child.content 653: end 654: end 655: text.strip =~ content 656: end 657: "" # Remove 658: end 659: 660: # Negation. Create another simple selector to handle it. 661: if statement.sub!(/^:not\(\s*/, "") 662: raise ArgumentError, "Double negatives are not missing feature" unless can_negate 663: @source << ":not(" 664: negation << simple_selector(statement, values, false) 665: raise ArgumentError, "Negation not closed" unless statement.sub!(/^\s*\)/, "") 666: @source << ")" 667: next 668: end 669: 670: # No match: moving on. 671: break 672: end 673: 674: # Return hash. The keys are mapped to instance variables. 675: {:tag_name=>tag_name, :attributes=>attributes, :pseudo=>pseudo, :negation=>negation} 676: end