--- title: HTMLListParser keywords: fastai sidebar: home_sidebar nb_path: "nbs/indexers.NoteListIndexer.Parser.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class HTMLListParser[source]

HTMLListParser()

Extracts lists from HTML data, generated by an HTML text editor like evernote

{% endraw %} {% raw %}
{% endraw %} {% raw %}

HTMLListParser.get_lists[source]

HTMLListParser.get_lists(note)

Extracts lists from a note

{% endraw %} {% raw %}

HTMLListParser.get_unformatted_lists[source]

HTMLListParser.get_unformatted_lists(note, txt, parsed)

retrieve lists without

    tags. We have two options: 1) multiline lists prefixed with a title keyword (e.g. "Buy:" "Read:") 2) single element single line lists

    {% endraw %} {% raw %}

    HTMLListParser.get_single_line_list[source]

    HTMLListParser.get_single_line_list(elem)

    Get single list lists. An example could be: 'read: great book title'

    {% endraw %}

    Usage

    Lets see how this works for an example note. We start with a note that was imported from evernote as example and show its content.

    {% raw %}
    evernote_file = PYI_TESTDATA / "notes" / "evernote" / "evernote-test-note-1.html"
    txt = read_file(evernote_file)
    note = INote.from_data(content=txt)
    
    {% endraw %} {% raw %}
    note.show()
    
    INote (#None) <div>
        <div><br clear="none" /></div>
        <div><br clear="none" /></div>
        <ul>
            <li>Buy groceries</li>
            <li>Call john<br clear="none" /></li>
            <li>Do the taxes</li>
            <li>Take out the trash</li>
            <li>Reply to carls mail</li>
        </ul>
        <div><br clear="none" /></div>
        <ul>
            <li>Buy groceries</li>
            <li>Call john<ul>
                    <li>He really needs to pick up</li>
                    <li>Because I need to speak to him</li>
                </ul>
            </li>
            <li>Do the taxes</li>
            <li>Take out the trash</li>
            <li><br clear="none" /></li>
            <li>Reply to carls mail</li>
        </ul>
        <div><br clear="none" /><strong>Buy</strong>: Toothpaste<br clear="none" /><br clear="none" /><em>Read</em>: The age of surveillance capitalism<br clear="none" /><br clear="none" />Watch: Parasite<br clear="none" /><br clear="none" /><u>Do</u>: The dishes<br clear="none" /><br clear="none" />Read</div>
        <ul>
            <li>Twenty one lessons for the 21st century</li>
            <li>Dreams from my Father</li>
        </ul>
        <div><br clear="none" /><strong>Read</strong><br clear="none" />The Great Gatsby<br clear="none" />Alice's Adventures in Wonderland<br clear="none" /><br clear="none" /><strong>Buy</strong><br clear="none" />groceries<br clear="none" />Shoes<br clear="none" /><br clear="none" />Read<br clear="none" />The Great Gatsby<br
                clear="none" />The odyssey<br clear="none" /><br clear="none" /></div>
    </div>
    
    {% endraw %}

    Which corresponds to this when rendered

    {% raw %}
    display(HTML(note.content))
    


    • Buy groceries
    • Call john
    • Do the taxes
    • Take out the trash
    • Reply to carls mail

    • Buy groceries
    • Call john
      • He really needs to pick up
      • Because I need to speak to him
    • Do the taxes
    • Take out the trash

    • Reply to carls mail

    Buy: Toothpaste

    Read: The age of surveillance capitalism

    Watch: Parasite

    Do: The dishes

    Read
    • Twenty one lessons for the 21st century
    • Dreams from my Father

    Read
    The Great Gatsby
    Alice's Adventures in Wonderland

    Buy
    groceries
    Shoes

    Read
    The Great Gatsby
    The odyssey

    {% endraw %}

    We can parse these using the HTMLListParser

    {% raw %}
    parser = HTMLListParser()
    
    {% endraw %} {% raw %}
    lists = parser.get_lists(note)
    lists
    
    [ULNoteList # Untitled 
     Buy groceries
     Call john
     Do the taxes
     Take out the trash
     Reply to carls mail
     ,
     ULNoteList # Untitled 
     Buy groceries
     Do the taxes
     Take out the trash
     Reply to carls mail
     ,
     ULNoteList # Untitled 
     Twenty one lessons for the 21st century
     Dreams from my Father
     ,
     (INoteList) # Buy:  
     Toothpaste
     ,
     (INoteList) # Read:  
     The age of surveillance capitalism
     ,
     (INoteList) # Watch:  
     Parasite
     ,
     (INoteList) # Do:  
     The dishes
     ,
     (INoteList) # Read 
     The Great GatsbyAlice's Adventures in Wonderland
     ,
     (INoteList) # Buy 
     groceriesShoes
     ,
     (INoteList) # Read 
     The Great GatsbyThe odyssey
     ]
    {% endraw %}